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Preface 


Every mathematician uses the real number system, but mathematics students are 
seldom told what it is. The typical undergraduate real analysis course, which is 
supposed to explain the foundations of calculus, usually assumes a definition of 
R, or else relegates it to an appendix. By failing to reach the real foundation (pun 
intended), real analysis runs the risk of looking like a mere rerun of calculus, but 
with more tedious proofs. A serious look at the real numbers, on the other hand, 
opens the eyes of students to a new world—a world of infinite sets, where the need 
for new ideas and new methods of proof is obvious. Not only are theorems about 
the real numbers interesting in themselves, they fit into the fundamental concepts of 
real analysis—limits, continuity, and measure—like a hand in a glove. 

However, any book that revisits the foundations of analysis has to reckon with the 
formidable precedent of Edmund Landau’s Grundlagen der Analysis (Foundations 
of Analysis) of 1930. Indeed, the influence of Landau’s book is probably the reason 
that so few books since 1930 have even attempted to include the construction of the 
real numbers in an introduction to analysis. On the one hand, Landau’s account is 
virtually the last word in rigor. The only way to be more rigorous would be to rewrite 
Landau’s proofs in computer-checkable form—which has in fact been done recently. 
On the other hand, Landau’s book is almost pathologically reader-unfriendly. In his 
Preface for the Student he says “Please forget everything you have learned in school; 
for you haven’t learned it,” and in his Preface for the Teacher “My book is written, 
as befits such easy material, in merciless telegram style.’ While memories of Landau 
still linger, so too does fear of the real numbers. 

In my opinion, the problem with Landau’s book is not so much the rigor (though 
it is excessive), but the lack of background, history, examples, and explanatory 
remarks. Also, the fact that he does nothing with the real numbers except construct 
them. In short, it could be an entirely different story if it were explained that the real 
numbers are interesting! This is what I have tried to do in the present book. 

In fact the real numbers perfectly exemplify the saying of Carl Ludwig Siegel 
that the mathematical universe is inhabited not only by important species but also by 
interesting individuals. There are interesting individual numbers (such as V2, e, and 
m), interesting sets of real numbers (such as the Cantor set, Vitali’s nonmeasurable 
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set), and even interesting sets of which no interesting member is known (such as the 
set of normal numbers). All of these examples were known in 1930, but in recent 
decades they have been joined by many new exotic sets arising from the study of 
fractals, chaos, and dynamical systems. 

The exotic sets arising from dynamical systems are one reason, I believe, to shift 
the emphasis of analysis somewhat from functions to sets. Of course, we are still 
interested in sequences of numbers and sequences of functions, and their limits. But 
now it seems equally reasonable to study sequences of sets, since many interesting 
sets, such as the Cantor set, arise as their limits. Another reason is simply the great 
advances made by set theory itself in recent decades, many of them motivated by 
the desire to better understand the real numbers. These advances are too technical 
for us to discuss in detail, but they result from the fundamental fact that analysis is 
based on uncountable sets and the struggle to understand this fact. 

The set of real numbers is the first, and still the most interesting, example of 
an uncountable set. The second example is the set of countable ordinals. It is less 
familiar to most mathematicians, but also of great importance in analysis. If analysis 
is taken to be the study of limit processes, then countable ordinals are the numbers 
that measure the complexity of functions and sets defined as limits of sequences. 
In particular, we assign the lowest level of complexity (zero) to the continuous 
functions, the next level of complexity (one) to the functions that are not continuous 
but are limits of continuous functions, complexity level two to functions that are not 
of level one but are limits of functions of level one, and so on. It turns out that there 
are functions of all levels 0, 1,2,3,...and beyond, because one can find a sequence 


of functions fo, fi, fo,..., respectively of levels 0, 1,2,..., whose limit is not at any 
of these levels. This calls for a transfinite number, called w, to label the first level 
beyond 0, 1, 2,.... 


The transfinite numbers needed to label the levels of complexity obtainable by 
limit processes not only make up an uncountable set: in fact they make up the 
smallest uncountable set. Thus, the raw materials of analysis—real numbers and 
limits—lead us to two uncountable sets that are seemingly very different. Whether 
these two sets are actually related—specifically, whether there is a bijection between 
the two—is the fundamental problem about real numbers: the continuum problem. 
The continuum problem was number one on Hilbert’s famous list of mathematical 
problems of 1900, and it still has not been solved. However, it has had enormous 
influence on the development of set theory and analysis. 

The above train of thought explains, I hope, why the present book is about set 
theory and analysis. The two subjects are too closely related to be treated separately, 
even though the usual undergraduate curriculum tries to do so. The typical set theory 
course fails to explain how set concepts are relevant to analysis—even seemingly 
abstruse ones such as different axioms of choice and large cardinals. And the typical 
real analysis course fails to address the set issues that arise inevitably from the real 
numbers, and from measure theory in particular. When the two subjects are treated 
together one gets (almost) two courses for the price of one. 

The book expands some of the material in my semi-popular book Roads to 
Infinity (Stillwell 2010) in textbook format, with more complete proofs, exercises to 
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reinforce them, and strengthened connections with analysis. The historical remarks, 
in particular, explain how the concepts of real number and infinity developed to meet 
the needs of analysis from ancient times to the late twentieth century. 

In writing the book, I had in mind an audience of senior undergraduates who have 
studied calculus and other basic mathematics. But I expect it will also be useful to 
graduate students and professional mathematicians who until now have been content 
to “assume” the real numbers. I would not go as far as Landau (“please forget 
everything you have learned in school; for you haven’t learned it’) but I believe 
it is enlightening, and fun, to learn something new about the real numbers. 

My thanks go to José Ferreiros and anonymous reviewers at Springer for 
corrections and helpful comments, and to my wife Elaine for her usual tireless 
proofreading. I also thank the University of San Francisco and Monash University 
for their support while I was researching and writing the book. 


San Francisco, CA, USA John Stillwell 
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Chapter 1 
The Fundamental Questions 


PREVIEW 


On the historical scale, analysis is a modern discipline with ancient roots. The 
machinery of analysis—the calculus—is a fusion of arithmetic with geometry that 
has been in existence for only a few hundred years, but the problem of achieving 
such a fusion is much older. The problem of combining arithmetic and geometry 
occurs in Euclid’s Elements, around 300 BCE, and indeed Euclid includes several of 
the ideas that we use to solve this problem today. 

In this preliminary chapter we introduce the basic problems arising from attempts 
to reconcile arithmetic with geometry, by discussing certain fundamental questions 
such as: 


e¢ What are numbers? 
e What is the line? 
¢ What is geometry? 


The ancient Greeks discovered the basic difficulty in reconciling arithmetic with 
geometry, namely, the existence of irrationals. Irrationals are needed to fill gaps in 
the naive concept of number, and these gaps can only be filled by admitting infinite 
processes into mathematics. Thus, to develop a number concept complete enough 
for calculus, we need a theory of infinity. The development of such a theory will be 
the subject of later chapters. 


1.1 A Specific Question: Why Does ab = ba? 


This question is not as trivial as it looks. Even if we agree that a and b are numbers, 
and ab is the product of a and b, we still have to agree on the meaning of numbers 
and the meaning of product—and these turn out to be deep and fascinating issues. 
To see why, consider how ab was understood from the time of ancient Greece until 
about 1860. 
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2 1 The Fundamental Questions 


Fig. 1.1 The product ab of lengths a and b 


Fig. 1.2 The rectangle picture of (a + b) 


In Greek mathematics, and in Euclid’s Elements in particular, quantities a and 
b were viewed as lengths, and their product ab was taken to be the rectangle with 
perpendicular sides a and b (Fig. 1.1.) 

Then it is completely obvious that ab = ba, because the rectangle with 
perpendicular sides b and a is the same as the rectangle with perpendicular sides 
a and b. It was so obvious that Euclid did not bother to point it out and, to the 
Greeks, ab = ba was probably not an interesting fact, because it was true virtually 
by definition. 

This could be considered a virtue of the rectangle definition; it makes the basic 
algebraic properties of the product available at a glance, so that one does not need 
to think about them. One such property, which Euclid did point out, is the formula 
that we write as 


(a+ by =a’ +2ab+b’. 


Many a beginning algebra student thinks that (a + b)? = a? + b?, but this mistake 
will not be made by anyone who looks at the rectangle picture of (a + b)? (Fig. 1.2). 
Clearly, the square with side a + b consists of a square a? with side a, a square b” 
with side b, but also two rectangles ab. Hence (a + b)? = a? + 2ab + b*. The Greeks 
were so fond of this picture that they even stamped it on coins! Figure 1.3 shows an 
example from the Greek island of Aegina, from around 400 BCE, even before the 
time of Euclid. 

This is “algebra,” but not as we know it. It runs alongside our algebra up to 
products of three lengths, but refuses to go further. The product of lengths a, b, and 
c was interpreted by the Greeks as a box with perpendicular sides a, b, and c. This 
interpretation agrees with ours—and makes it possible to visualize results such as 


1.1 A Specific Question: Why Does ab = ba? 3 


Fig. 1.3 Aegina coin 


(a+b) = a + 3a*b + 3ab* + b>—but what is the product of four lengths a, b, c, 
and d? To the Greek way of thinking there was no such thing, because we cannot 
imagine four lines in mutually perpendicular directions. 

Thus, the Greek interpretation of numbers as lengths and products as rectangles 
or boxes has its limitations. Nevertheless, it remained as the mental picture of 
products long after the Greek concept of length was replaced by a general concept 
of number (see the next section for more on this development). For example, here 
is a passage from Newton (1665) in which even the product of whole numbers is 
described as their “rectangle”: 


For y° number of points in w™ two lines may intersect can never bee greater y” y° rectangle 
of y® number of their dimensions. 


Here the “lines” are what we would call algebraic curves, and their “dimensions” 
are their degrees, which are whole numbers. (Also, it should probably be pointed out 
that the “y” in Newton’s time is “th” in modern English.) Finally, as late as 1863, the 
great number theorist Dirichlet appealed to the rectangle picture in order to explain 
why ab = ba for whole numbers a and b. On page | of his Lectures on Number 
Theory he asks the reader to imagine objects arranged in a rows of b objects, or in b 
rows of a objects, and to realize that the number of objects is the same in each case. 

Surely, nothing could be clearer. Nevertheless, it is surprising that the same idea 
applies to two vastly different kinds of quantity: lengths, which vary continuously, 
and whole numbers, which vary discretely, or in jumps. Finding a concept of 
number that embraces these two extremes is a long journey, which results in a new 
understanding and appreciation of the law ab = ba. It will take two chapters to 
complete, and the remainder of this chapter outlines the obstacles that have to be 
overcome. 


Exercises 


1.1.1 Give pictorial versions of the distributive law a(b+c) = ab+ac, and the identity a? — b* = 


(a—b)a+b). 
1.1.2. Also explain why (a + by’ = a? + 3a°b + 3ab? + b°, with a picture of a suitable cube. 


4 1 The Fundamental Questions 
1.2. What Are Numbers? 


Numbers answer two subtly different questions: how many and how much? the first 
is the simpler question, answered by the natural numbers 


The natural numbers originated for the simple purpose of counting, but they 
somehow developed an intricate structure, with operations of addition and multi- 
plication and (partially) subtraction and division. The subtraction operation invites 
an extension of the natural numbers to the integers, 


so that subtraction becomes fully defined. And the division operation invites an 
extension of the integers to the rational numbers m/n for all integers m and n # 0, 
so that division is defined for all nonzero rational numbers. 

You will know the rules for operating on natural numbers, integers and rational 
numbers from elementary school but, almost certainly, no underlying reason for 
the rules will have been given. In Sect.2.2 we will show that all the rules for 
operating on numbers stem from their original purpose of counting, whereby all 
natural numbers originate from 0 by repeatedly adding 1. 

You will also know from school that the rational numbers give an approximate 
answer to the second question: how much? This is because quantities such as 
length, area, mass, and so on, can be measured to arbitrary precision by rational 
numbers. Indeed, we can measure to arbitrary precision by finite decimals, or 
decimal fractions, which are rational numbers of the form m/10”, where m and n are 
integers. (For example, 3.14 = 314/107.) But arbitrary precision is not exactness, 
and some quantities are not exactly equal to any rational number. 

The most famous example is the length, V2, of the diagonal of the unit square. 
The best-known proof goes as follows. 


Irrationality of V2. There is no rational number whose square equals 2. 


Proof. Suppose on the contrary that 2 = m/n? for some positive integers m and n. 
Then we have the following series of implications. 


2 


2=m |r? => 2n* =m (multiplying both sides by n”) 


=> m’ is even 
=> mis even (because the square of an odd number is odd) 


=>m=2m' (for some natural number m’) 


=> 2n? = (2m') (substituting 2m’ for m in second line) 
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=n? = 2m” (dividing both sides by 2) 
=> nis even (by same argument as used above for m) 
>n=2n' (for some natural number n’) 
=>2=m"/n” (where m’ < mandn’ <n). 


So, for any pair m,n of natural numbers with 2 = m?/n? there is a smaller pair 
m’,n’ with the same property, and we therefore have an infinite descending sequence 
of natural numbers, which is impossible. oO 


Thus, geometry demands irrational numbers. This discovery threw arithmetic 
into confusion, because it is not clear how to add and multiply irrational numbers. 
For example, is it true that v2x V3 = V6? Also, is there an arithmetic definition of 
multiplication compatible with geometry, where V2 x V3 measures the area of the 
rectangle with adjacent sides V2 and V3? We take up these questions in Sect. 2.4. 


Exercises 


If one attempts to prove that V3 is irrational by supposing that V3 = m/n and reasoning as above, 
one reaches the equation n* = 3m7. It no longer follows that n? is even. 


1.2.1. What property of n” does follow from the equation n? = 3m”? 
1.2.2 Use this property to devise a proof that V3 is irrational. 
1.2.3. Also give a proof that V5 is irrational. 


1.3. What Is the Line? 


More precisely, what are points, and how do they fill the line? Or, how do we make 
a continuum from points? We would like to say that points on the line are numbers, 
but it is hard to recreate the uniform and unbroken quality of the line from our 
fragmentary perception of individual numbers. It is possible, certainly, to visualize 
the integer points on the line (Fig. 1.4) 

Extending this vision to all the rational points is already a challenging task, 
because the rational points are dense—i.e., there are infinitely many of them in 
any interval of the line, no matter how small. One way to cope with density is to 
consider the integer points (m,n) of the plane, and to view the rational numbers 
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Fig. 1.4 Integer points on the line 
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Fig. 1.5 Slopes to integer points on the plane 


as the (slopes n/m of) lines from (0,0) to other integer points (m,n) with m # 0.! 
(Figure 1.5 shows some of the positive rational numbers as slopes of lines from the 
origin to integer points in the first quadrant.) 

This view also includes the irrational numbers in the form of lines through (0, 0) 
that miss all other integer points in the plane. 

However, while it is nice to visualize the density of the rationals, and the gaps in 
them that correspond to irrationals, this picture brings us no closer to an arithmetic 
definition of the points on the line. For this, we need infinite concepts of some 
kind, so as to approach the irrational points via the rationals. The most familiar 
is probably the concept of the infinite decimal, which embraces both rational and 
irrational points in a uniform way. Infinite decimals extend finite decimals in a 
natural way and, like the finite decimals, they have a clear ordering corresponding 
to the ordering of points on the line. 

As we said in Sect. 1.2, a finite decimal is a rational number of the form m/10", 
which we write by inserting a decimal point before the last n digits of the decimal 
numeral for m. Thus, 3.14 is the decimal form of 314/102. An infinite decimal, 
such as 


1.414213..., 


‘In this book we use ( and ) to bracket ordered pairs, triples, and so on. This is to avoid confusion 
with the notation (a,b), which will later be used for the open interval of points x such that 
a<x<b. 
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represents the /imit of the finite decimals 
14, 141, 1.414, 1.4142, 1.41421, 1.414213, 


that is, the number “approached” by these finite decimals as the number of decimal 
places increases. We will define the concept of limit formally in Sect. 2.6, and simply 
assume some familiarity with infinite decimals for the present. 

It is intuitively plausible that each point on the line corresponds to an infinite 
decimal, because we can find the successive decimal places of any point P by 
starting with the integer interval containing P, dividing the interval into 10 equal 
parts to find the first decimal place of P, dividing that subinterval into 10 equal parts 
to find the second decimal place, and so on. It is also plausible that different points 
P and Q will have different decimals, because repeated subdivision will eventually 
“separate” P from Q—they will eventually fall within different parts of the nth 
subdivision, and hence differ in the nth decimal place. Moreover, we can decide 
which of P, Q is less from their decimals—the lesser point is the one with the lesser 
digit in the first decimal place where they differ. 

Thus, infinite decimals give a simple numerical representation of points on the 
line, which is particularly convenient for describing the ordering of points. However, 
infinite decimals are not convenient for describing addition and multiplication, 
so they are not a useful solution of the problem of defining addition and mul- 
tiplication of irrational numbers. We will solve the latter problem differently in 
Sect. 2.4. 


Exercises 


Infinite decimals are also good for distinguishing (theoretically) between rational and irrational 
numbers: the rational numbers are those with ultimately periodic decimals. To see why any 
ultimately periodic decimal represents a rational number one uses an easy computation with 
decimals; namely, multiplication by 10 (possibly repeated), which shifts all digits one place to 
the left, and subtraction. 


1.3.1 If x = 0.37373737..., express x as a ratio of integers. 

1.3.2. If y =0.519191919..., express y as a ratio of integers. 

1.3.3, By generalizing the idea of the previous exercises, explain why each ultimately periodic 
decimal represents a rational number. 

1.3.4 Find the decimals for 1/6 and 1/7. 

1.3.5 By means of the division process, or otherwise, explain why each rational number has an 
ultimately periodic decimal. 


The picture of integer points in the plane, used above to visualize rational numbers, has an 
interesting extension. 


1.3.6 If each rational point in the plane is surrounded by a disk of fixed size ¢, show that there is 
no line from (0, 0) that misses all other disks. 

1.3.7 Conclude that, if space were filled uniformly with stars of uniform size, the whole sky would 
be filled with light (the Olbers paradox). 
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1.4 What Is Geometry? 


In a trivial sense, numbers are related to geometry because the real numbers are 
motivated by our mental image of the line. However, the geometry of the line is 
not very interesting, compared with the geometry of the plane. What properties of 
numbers, if any, are relevant to the geometry of the plane? The short answer is: 
Pythagorean triples. 

These are whole number triples (a, b, c) such that a+b? =’ or, equivalently, 
pairs of whole numbers (b, c) such that c” — b? is a perfect square. A list of such pairs 
occurs in clay tablet known as Plimpton 322, which was inscribed around 1800 BCE 
in ancient Mesopotamia. Table 1.1 shows these pairs in modern notation, together 
with the number a = Vc? — b?, and a fraction x that will be explained later. 

The original tablet lists the pairs (b,c) in the order given above (part of it 
is broken off) but not the numbers a, without which the numbers b and c look 
almost random and meaningless. The first person to realize that the pairs (b, c) are 
mathematically significant was the mathematics historian Otto Neugebauer, who in 
1945 noticed that Vc? — b? is a whole number in each case. [See Neugebauer and 
Sachs (1945).] This led him to suspect that Plimpton 322 was really a table of triples 
(a, b,c) with the property 


2 


a’ +b? = ¢’. 
Such triples are called Pythagorean because of the famous Pythagorean theorem 
asserting that a* + b* = c? holds in any right-angled triangle with sides a,b and 
hypotenuse c. It can hardly be a coincidence that the numbers b, c have the numerical 
property that Vc? — b? is a whole number, but was the compiler of Plimpton 322 


Table 1.1. Numbers in 


Plimpton 322, and related a a, a ae 

numbers 120 119 169 12/5 
3,456 3,367 4,825 64/27 
4,800 4,601 6,649 75/32 
13,500 12,709 18,541 125/54 
72 65 97 9/4 
360 319 481 20/9 
2,700 2,291 3,541 54/25 
960 799 1,249 32/15 
600 481 769 25/12 
6,480 4,961 8,161 81/40 
60 45 75 2 
2,400 1,679 2,929 48/25 
240 161 289 15/8 
2,700 1,771 3,229 50/27 


90 56 106 9/5, 
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Fig. 1.6 Triangle shapes from Plimpton 322 


thinking about triangles? What makes this virtually certain is that the ratio b/a (the 
“slope of the hypotenuse’) decreases steadily, and in roughly equal steps from just 
below the 45° slope to just above the 30° slope. Thus, the triples are geometrically 
ordered, and geometrically bounded, in a natural way. Moreover, it has been pointed 
out by Christopher Zeeman (see exercises below) that there are just 16 triangles 
between these bounds, subject to a certain “simplicity” condition, and Plimpton 322 
contains the first 15 of them. Figure 1.6 shows the shapes of the 15 triangles in 
question. 

Thus, Neugebauer’s discovery suggests that the numbers in Plimpton 322 have 
a geometric meaning, and that the Pythagorean theorem was known long before 
Pythagoras, who lived around 500 BCE. 

The ordering of triples (a,b,c) by the ratios b/a makes it clear that positive 
rational numbers (ratios of positive integers) were also part of ancient mathematical 
thinking. So it is reasonable to suppose that rational numbers may have been 
involved in the generation of the pairs (b, c) in Plimpton 322, and that may explain 
how huge pairs such as (12709, 18541) could be discovered. One way to do this is 
via the fractions x appended in the last column of the table. The fractions x “explain” 
the triples (a, b, c) in the sense that they yield each a, b and c by the formulas 


Dt, Ay 6 _4f 3 
a 2\" xP a 2" x 


and each x is considerably shorter than the triple it generates. Moreover, the fractions 
x and 1/x are noteworthy because each has a finite expansion in base 60, the 
number system used in ancient Mesopotamia. This is because the numerator and 
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denominator of each factorizes into powers of 2, 3, or 5. These properties are 
explored more thoroughly in the exercises below. 

Like the Mesopotamians, the Pythagoreans were struck by the presence of 
whole numbers, and their ratios, in geometry (and elsewhere, particularly music). 
However, as we saw in Sect. 1.2, they discovered gaps in the rational numbers, 
which makes the whole program of using numbers in geometry problematic. The 
gap at V2 is also conspicuous by its absence from Plimpton 322: the hypotenuse 
lines of slope b/a stop just short of the line of slope 1 corresponding to the diagonal 
of a square. 


Exercises 


We now explore how each Pythagorean triple (a,b,c) in Plimpton 322 is “explained” by the 
fraction x in the last column of Table 1.1. Notice that x generally has the same number of digits as 
each of the numbers a, b,c, so all three of these numbers can be encoded by a number of the same 
“length” as any one of them. Also, we will see that in each case x is “simple” in a certain sense. 


For each line in the table, 


1.4.1 Check that 5 (x— +) = 138 when x = 12/5. 


1.4.2 Also check, for three other lines in the table, that 


The numbers x are not only “shorter” than the numbers b/a, they are “simple” in the sense that 
they are built from the numbers 2, 3, and 5. For example 


12. 2x3 


1s. 


and — 
by 5 54 2x33 


1.4.3. Check that every other fraction x in the table can be written with both numerator and 
denominator as a product of powers of 2, 3, or 5. 


The formula 5 (x - 1) = b gives us whole numbers a and b from a number x. Why should 
there be a whole number c such that a” + b? = c?? 


2 2 
1.4.4 Verify by algebra that [4 (x - 1)| pels [4 (x + 1)| : 
1.4.5 Deduce from Exercise 1.4.4 that a? + b* = c?, where 5 x+ t =<, 


1.4.6 Check that the formula in Exercise 1.4.5 gives c = 169 when x = 12/5 (the first line of the 
table), and also check three other lines in the table. 


In a 1995 talk at the University of Texas at San Antonio, Christopher Zeeman showed that 
there are exactly 16 fractions x with denominator less than 60 and numerator and denominator 
composed of factors 2, 3, and 5 that give slope b/a corresponding to an angle between 30° and 
45°. The Pythagorean triples in Plimpton 322 correspond to the first 15 slopes (in decreasing order) 
obtainable from these values of x. 
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1.4.7 The slope not included in Plimpton 322 comes from the value x = 16/9. Show that the 
corresponding slope is 175/288, and that the corresponding angle is a little more than 30°. 


1.5 What Are Functions? 


Having seen how the rational numbers can be “completed” to the line of all real 
numbers, we might hope that there are similar “rational functions” among all the 
real functions. There are indeed rational functions. They are the functions that are 
quotients of polynomials 


D(x) = ago +ayxt nx? +++ + a,x", where ap, d1,d0,..., 4, are real numbers. 


The polynomial functions play the role of integers, and they are sometimes called 
“integral rational functions.” 

Just as rational numbers fail to exhaust all numbers, rational functions fail to 
exhaust all functions, and indeed there are many naturally occurring functions 
that are not rational, such as sinx and cosx. We can see that the function sin x 
is not rational, because it is zero for infinitely many values of x—namely, x = 
0, +7, +27, +37,.... A rational function, on the other hand, is zero only when its 
numerator is zero. This happens for at most n values of x, where n is the degree of 
the numerator, by the fundamental theorem of algebra. 

The functions sin x and cos x are nevertheless limits of rational functions, and 
indeed of polynomials, because 


; Diener, ae 
sinx=x-—+—-—+H:::, 
3! St 7! 
Sigh! og 
cosx=1-—+—-—+ 
2! 4! 6! 


Thus, sin x is the limit of the sequence of polynomials 


3 3 5 
xX, xXx-=—, x-—+—,..., 


3!’ at oY 


just as V2 is the limit of the sequence of rationals 1, 1.4, 1.41, 1.414, .. .. The infinite 
series that occur as limits of polynomials are called power series, and they form an 
important class of functions. It is tempting to think that power series “complete” 
the rational functions in the same way that the real numbers complete the rational 
numbers, and indeed Newton was extremely impressed by the analogy: 


I am amazed that it has occurred to no one ... to fit the doctrine recently established for 
decimal numbers in similar fashion to variables, especially since the way is then open to 
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Fig. 1.7 Modes of vibration 


more striking consequences. For since this doctrine in species has the same relationship 
to Algebra that the doctrine in decimal numbers has to common Arithmetic, its operations 
of Addition, Subtraction, Multiplication, Division and Root extraction may be easily learnt 
from the latter’s. 


Newton (1671), p. 35. 


Power series do indeed vastly increase the range of functions to which algebraic 
operations apply, and they are also subject to the calculus operations of differen- 
tiation and integration, as Newton was well aware. Nevertheless, power series by 
no means exhaust all possible functions. A much wider class of functions came to 
light in the eighteenth century when mathematicians investigated the problem of the 
vibrating string. 

A taut string with two fixed ends has many simple modes of vibration, the first 
few of which are shown in Fig. 1.7. 

The shape of the first mode is one-half of the sine curve y = sin x (scaled down 
in the y dimension), the shape of the second is y = sin 2x, and so on. These simple 
modes correspond to simple tones, of higher and higher pitch as the number of 
waves increases, and they may be summed to form compound tones corresponding 
to (possibly infinite) sums 


b, sinx + bo sin2x + b3sin3x+--- 
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Fig. 1.8 Triangular wave 


Fig. 1.9 Approximations to the triangle wave 


It seems, conversely, that an arbitrary continuous wave form may be realized by 
such a sum. This was first conjectured by Bernoulli (1753), and his remark led to 
the realization that trigonometric series were even more arbitrary than power series. 
For example, a power series may be differentiated at any point, so the corresponding 
curve has a tangent at any point. This is not the case for the “triangular wave” shown 
in Fig. 1.8, which has no tangent at its highest points. 

However, the triangular wave is the sum of the infinite series 


sin3x sin5x sin7x 


+ —— ———— 
9 25 49 


sin x — 


Figure 1.9 shows the sums of the first one, two, three, and four terms of this series, 
and how they approach the triangular wave shape. This discovery made it acceptable 
to consider any continuous graph (on a finite interval) as the graph of a function, 
because one could express any such graph by a trigonometric series. 

Indeed, why insist that “functions” be continuous? Why not allow the following 
rule to define a function? (It is known as the Dirichlet function.) 
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nee 1 if x is rational 
~ | 0 if x is irrational. 


This rule defines a unique value of d(x) for each x, which is perhaps all we need 
ask of a function. Interesting though it may be to find formulas for functions (and 
indeed it is especially interesting for d(x), as we will see later), the essence of a 
function f is the mere existence of a unique value f(x) for each x. We can even 
strip the concept of “rule” off the concept of function by making the following 
definition. 


Definition. A function f is a set of ordered pairs (x, y) that includes at most one 
pair (x, y) for each x, in which case we say y = f(x). The set of x values occurring 
in the ordered pairs form the domain of f, and the set of y values form its range. 


Reducing the concept of function to that of set, as we have done here, is part of a 
view of mathematics that we will generally follow in this book—that everything is a 
set. We will see later that mathematics can be built from the ground up according to 
this view, starting with the natural numbers. Of course, it is not practical constantly 
to think in terms of sets, but the set concept gives a simple and uniform answer if 
anyone asks what we are “really” talking about. 


Exercises 


The rational functions are similar to numbers in several respects: they can be added, subtracted, 
multiplied, and divided (except by 0), and they satisfy the same rules of algebra, such as a+b = b+a 
and ab = ba. More surprisingly, they can be ordered. If f and g are rational functions, and f # g, 
we say that f < g org > f if g(x) is ultimately greater than f(x); that is, if g(x) > f(x) for 
all sufficiently large x. Then, if f and g are any rational functions, either f < g org < f. The 
following exercises explain why. 


1.5.1 If f(x) = x, g(x) = x° + 100, A(x) = 2°, show that f <g <h. 

1.5.2. If f and g are polynomials, explain how to tell which of f, g is the greater. 

1.5.3, Show that aa > act for x sufficiently large. 

1.5.4 If f and g are rational functions, explain how to tell which of f, g is greater by reducing to 


the same question about polynomials. 


Notice also that the rational functions include a copy of the real numbers, if we associate each real 
number a with the constant function f(x) = a. Thus, the ordered set of rational functions is a kind 
of “expanded line,” with new points corresponding to the nonconstant functions. 

The new points, however, make the set of rational functions less suitable, as a model of the 
intuitive line, than the set of real numbers. One reason is that the rational function “line” includes 
infinitesimals—positive functions that are smaller than any positive real number. 


1.5.5 Show that the rational function u(x) = 1/x is greater than zero but less than any constant 
function. (The symbol z is the Greek letter iota, which seems appropriate for an infinitesimal 
function.) 
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1.6 What Is Continuity? 


In the previous section we observed the concept of continuous function on our way 
to the general concept of function. But the concept of continuous function, though 
it arose earlier than the general concept, turns out to be much harder to define 
precisely. 

Our intuitive concept of a continuous function is one whose graph is an unbroken 
curve. We note in passing that the concept of a “curve” is thereby linked to the 
concept of a continuous function, and we later (Sect. 4.4) take up what this says 
about the concept of curve. But for now we will concentrate on the meaning of 
“unbrokenness,” or the absence of gaps, which is a concept called connectedness. 
We have already noted that connectedness is an attribute of the line. But the concept 
of continuous function is more general, because it has meaning whether or not the 
domain of the function is connected. 

We will eventually want to study continuous functions on disconnected domains, 
but for the present we restrict ourselves to functions on R or on intervals of R such 
as [0,1] = {x : 0 < x < 1} or (0,0) = {x : x > O}. In this case we can define a 
function f to be continuous on the interval if, for any number a in the interval, f(x) 
approaches f(a) as x approaches a. There are various ways to formalize the idea of 
“approaching,” which we will discuss in later chapters. For now, we just illustrate 
the idea with two examples of functions that visibly fail to be continuous at a point 
x = a. The first example is the function 


(ye -1 forx <0 
We 4 fot = 0: 


This function fails to be continuous at x = 0 because g(x) approaches —1 (in fact 
stays constantly equal to —1) as x approaches 0 from below, yet g(0) = 1. The 
second example is the function 


0 for x <0 
AQx=4 . 4 
sin + for x > 0, 


which fails to approach any value as x approaches 0 from above, because it oscillates 


between —1 and | no matter how close x comes to 0. The graphs of g and h in 
Fig. 1.10 clearly show the points of discontinuity. 


y = h(x) 
y = g(x) 


Fig. 1.10 Graphs of the functions g and h 
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A continuous function on a closed interval [a, b] = {x : a < x < b} has properties 
that one would expect from the intuition that its graph is an unbroken curve from 
the point (a, f(a)) to the point (b, f(b)). Notably, the following: 


Intermediate Value Property. [f f is a continuous function on the closed interval 
[a, b], then f takes each value between f(a) and f(b). 


Extreme Value Property. If f is a continuous function on the closed interval [a, b], 
then f takes a maximum and a minimum value on [a, b]. 


As obvious as these properties appear, they are far from trivial to prove, and they 
depend on the connectedness property of R. Indeed these properties are logically 
and historically the reason why we need to study the real numbers in the first 
place. The need to prove the intermediate value property became pressing after 
Gauss (1816) used it in a proof of the fundamental theorem of algebra, which states 
that each polynomial equation has a solution in the complex numbers. The first 
attempt to prove the intermediate value property was made by Bolzano (1817), but 
Bolzano’s proof rests on another assumption that one would like to be provable— 
the least upper bound property of R, according to which any bounded set of real 
numbers has a least upper bound. In 1858, Dedekind first realized that the least 
upper bound property is rigorously provable from a suitable definition of R, thus 
providing a sound foundation for all the basic theorems of calculus. This is why 
whole books have been written about R, such as Dedekind (1872), Huntington 
(1917), and Landau (1951). We explain the construction of R, and prove its basic 
properties, in Chap. 2. 

But the study of functions also demands that we study sets of real numbers. For 
example, when a function is discontinuous we may wish to understand how far 
it departs from continuity, and this involves studying the set of points where it is 
discontinuous. A basic question then is how large (or small) a set of points may be, 
and hence: how can we measure a set of real numbers? We expand on this question 
in the next section. 


Exercises 


1.6.1 Show that x* — a* = (x - a)! + axt? 4+ xk 3 +--+"). 

1.6.2 Deduce from Exercise 1.6.1 that, x — a divides p(x) — p(a) for any polynomial p(x). 

1.6.3 Deduce from Exercise 1.6.2 that each root x = a of a polynomial equation p(x) = 0 
corresponds to a factor (x — a) of p(x), and hence that the equation p(x) = 0 has at most n 
roots when p has degree n. 

1.6.4 Show, using the intermediate value property, that x° + x+ 1 = 0 has at least one real root. 

1.6.5 More generally, show that a polynomial equation p(x) = 0 has areal root for any polynomial 
p of odd degree. 
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1.7. What Is Measure? 


It is intuitively plausible that every set S of real numbers, say in [0,1], should have 
a measure, because it seems meaningful to ask: if we choose a point x at random 
in [0,1], what is the probability that x lies in S? This probability, if it exists, can be 
taken as the measure of S$. Certain simple sets certainly have a measure in the sense 
of this thought experiment. The measure of the interval [a, b] should be b — a, and 
this should also be the measure of (a,b), because the measure of the point a or b 
should be zero. 

A more interesting set is the set of rationals in [0,1]. If we ask what is the 
probability that a random point is rational, the surprising answer is zero! This is 
because we can cover all the rational numbers in [0,1] by intervals of total length 
< €, for any number e > 0. To see why, notice that we can arrange all the rationals 
in [0,1] in the following list (according to increasing denominator): 


1 1 3 1 
2 ? 3 2 3 ? 4 ? 4 ? 5 ? 
So if we cover the nth rational on the list by an interval of length ¢/2”*! then 


total length covered < = + =4=24...=¢. 
2 4 8 


Since € can be made as small as we please, the total measure of the rational numbers 
can only be zero. This is surely surprising! In fact, we have exposed two surprising 
facts: 


1. The rational numbers in [0,1] can be arranged in a list. 
2. Any listable set has measure zero, and hence it is not the interval [0,1]. 


These two facts together give another proof that irrational numbers exist; in fact 
they show that “almost all” numbers are irrational, because the probability that a 
randomly chosen number is rational is zero. 

Thus, the concept of measurability leads to unexpected discoveries about sets of 
real numbers. Even more surprising results come to light as we pursue the concept 
further, as we will do in later chapters. 


1.7.1 Areaand Volume 


The idea of determining measure by adding infinitely many items actually goes back 
to ancient Greece, where the method was used by Euclid and Archimedes to find 
certain areas and volumes. A spectacular example was Archimedes’ determination 
of the area of a parabolic segment, which he found by filling the parabolic segment 
by infinitely many triangles. His method is illustrated in Fig. 1.12. For simplicity 
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Fig. 1.11 The parabolic segment 


Fig. 1.12 Filling the parabolic segment with triangles 


we take the parabola y = x, and cut off the segment shown in Fig. 1.11, between 
x=-landx=1. 

The first triangle has vertices at the ends and midpoint of the parabola. Between 
each of its lower edges and the parabola we insert a new triangle whose third 
vertex is also on the parabola, halfway (in x value) between the other two. Then we 
repeat the process with the lower edges of the new triangles. This creates successive 
“generations” of triangles, with each triangle in generation n+ | having two vertices 
from a triangle in generation 7 and its third vertex also on the parabola halfway (in x 
value) between them. The first three generations are shown in black, gray, and light 
gray in Fig. 1.12. 

It is easy to check (see exercises) that the area of generation n + | is 1/4 of the 
area of generation n, so we can find the area by summing a geometric series: 


area of parabolic segment 


1 1 4 
=(14 54 Zp +e) area ofthe hist triangle = $1 = 5 
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Exercises 


The following exercises verify the areas of the triangles in Archimedes’ construction. 


1.7.1 Show that the triangle with vertices (which lie on the parabola y = x?) 


2 3 
2 at+b (a+b 2 b-a 
(a,a’), 5) ( 5 ] ) (b,b°) has area 5 : 


1.7.2 Deduce from Question 1.7.1 that each triangle in generation n has area 2->", and hence that 
the total area of generation n is 27-2”. 

1.7.3. Deduce from Question 1.7.2 that the total area of all triangles inside the parabolic segment 
is 4/3. 


Virtually the same geometric series occurs in Euclid’s determination of the volume of the 
tetrahedron, in the Elements, Book XII, Proposition 4. Euclid uses two prisms whose edges join 
the midpoints of the tetrahedron edges (Fig. 1.13). 


1.7.4 Assuming that the volume of a prism is triangular base area x height, show that the prisms 
in Fig. 1.13 have volume equal to 1/4 (tetrahedron base area) x (tetrahedron height). 


Now remove these two prisms, and repeat the process in the two tetrahedra that remain, as in 
Fig. 1.14. 


1.7.5 By repeating the argument of Exercise 1.7.4, show that 


1 1 1 
volume of tetrahedron = Z + 7 + B ae + base area X height 


= 1/3 base area x height. 


Fig. 1.13 Euclid’s prisms inside a tetrahedron 


a oe 


Fig. 1.14 Repeated dissection of the tetrahedron 
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1.8 What Does Analysis Want from IR? 


From the discussions in the preceding sections, we expect that answers to several 
fundamental questions about numbers, functions, and curves will emerge from a 
better understanding of the system R of real numbers. To obtain good answers, we 
seem to want R to have the following (as yet only vaguely defined) properties. 


1. Algebraic structure. 

Since the members of R are supposed to be numbers, they should admit sum, 
difference, product, and quotient operations, subject to the usual rules of algebra. 
For example, it should be true that ab = ba and that (V2)? = 2. 

2. Completeness. 

R should be arithmetically complete, in the sense that certain infinite oper- 
ations on R have results in R. For example, the infinite sum represented by an 
infinite decimal, such as 


1 
3.14159---=34+ Tot 102 * T03 * Tor * ioe 


should be a member of R. 

R should also be geometrically complete, in the sense of having no gaps, like 
a continuous line. From this property we hope to derive a concept of continuous 
function with the expected properties, such as the intermediate value property. 

Hopefully, arithmetic and geometric completeness will be equivalent, so both 
can be achieved simultaneously. Also, if R behaves like a line, then R? should 
behave like a plane. 

3. Measurability of subsets (of both R and R?). 

We hope for a definition of measure that gives a definite measure to each 
subset of R, or at least to each “clearly defined” subset of IR. We hope the same 
for subsets of R*, because the subsets of the plane R? include the “regions under 
curves y = f(x),” the measure of which should represent the integral of f. 

Thus, the problem of finding the measure of plane sets includes the problem 
of finding integrals—one of the fundamental problems of analysis. 


Exercises 


Large parts of analysis also depend on the complex numbers, which are numbers of the form a + bi, 
where a and b are real and i7 = —1. To avoid a separate discussion of complex numbers in this 
book we show that their properties reduce to those of the real numbers. An elegant way to do this 
is to represent each complex number a + bi by the 2 x 2 matrix 


g } where a and D are real. 
—ba 
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Then the sum and product of complex numbers are the sum and product of matrices, which are 
defined in terms of the sum and product of real numbers, so they inherit their algebraic properties 
from those of the real numbers. 


1.8.1 Writing 


show that i? = -1. 


1.8.2 If0= ( q 


0 and al + bi # 0, show that (a1 + bi)“! exists, and 


en 1 ; 
(a1 + bi)! = aa (al — bi). 


1.8.3. Show, using the properties of matrix addition and multiplication, that for any complex 
numbers u, v, w: 


ut+tv=v+u4, uv = UU, 
u+(v+w)=(ut+v)+wu, u(vw) = (uv)w, 
u+0=un, u-l=u, 


ut+(—u)=0,  u-w!=1 foru#0 


u(v + w) =uv + uw. 


1.9 Historical Remarks 


As can be seen from the early sections of this chapter, some fundamental problems 
of analysis arose long before the development of calculus, and they were not solved 
until long after. It is fair to say, however, that calculus focused the attention of 
mathematicians on infinite processes, and it drove the search for answers to the 
fundamental questions. It turned out that the ancients themselves were close to 
answers—or so it seems with the advantage of hindsight—but they were held back 
by fear of infinity. 

Much of what we know about the ancient Greek understanding of numbers 
and geometry comes from Euclid’s Elements, written around 300 BCE. We know 
that Euclid collected ideas from earlier mathematicians, such as Eudoxus, but the 
Elements is the first known systematic presentation of mathematics. It covers both 
geometry and number theory, and it struggles with the problem that divides them: 
the existence of irrational quantities. The longest book in the Elements, Book X, 
is devoted to the classification of irrational quantities arising in geometry. And 
the most subtle book in the Elements, Book V, is devoted to Eudoxus’ “theory of 
proportions,” which seeks to deal with irrational quantities by comparing them with 
the rationals. 
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In the next chapter we will see how that idea of comparing an irrational quantity 
with rationals was revived by Dedekind in the nineteenth century to provide a 
concept of real numbers making up a number line, thus providing an arithmetical 
foundation for geometry. The novel part of Dedekind’s idea is its acceptance of 
infinite sets—an idea that the Greeks rejected. 

Another idea of Eudoxus, the “method of exhaustion,” was also pushed further 
in the late nineteenth century theory of measure. Just as the theory of proportions 
compares a complicated (irrational) number with simple ones (rationals), the 
method of exhaustion compares a complicated geometric object with simple (and 
measurable) ones, such as triangles or rectangles. A typical example of exhaustion 
is Archimedes’ determination of the area of a parabolic segment by comparing it 
with collections of triangles (Sect. 1.7). Although there are infinitely many triangles 
in the construction, the Greeks avoided considering their infinite totality by showing 
that the area of the parabolic segment can be approximated arbitrarily closely by 
finitely many of them. 

Thus, by summing a finite geometric series, one can show that any area less 
than 4/3 may be exceeded by a finite collection of triangles inside the parabolic 
segment. The possibility that the segment has area less than 4/3 is thereby ruled 
out, and the only remaining possibility is that its area equals 4/3. This is what the 
word “exhaustion” means in this context: one finds the exact value of the area by 
exhausting all other possible values. In the late nineteenth century it was found 
that extremely complicated geometric objects, called measurable sets, could be 
measured by a similar method. The objects in question are again approximated by 
finite collections of simple objects (line intervals or rectangles), but showing that 
the approximation is arbitrarily close may require the use of infinite collections. 
This will be explained in Chap. 9. 

In this sense, we can say that the ancient Greeks came close to answering 
the basic questions about number, geometry, and measure. The questions about 
functions and continuity are another story, very much a product of the development 
of calculus in the seventeenth and eighteenth centuries. 

As mentioned in Sect.1.5, “functions” were originally things described by 
“formulas,” though formulas could be infinite power series or trigonometric series. 
But when Bernoulli (1753) conjectured that the shape of an arbitrary string could 
be expressed by a trigonometric series it was still thought that such a function must 
be continuous. This was disproved by the general theory of trigonometric series 
developed by Fourier (1807). Among other examples, Fourier exhibited the “square 
wave” function 


cos 3x : cos5x cos7x 
3 5 7 


COS X — 


which jumps from the value 7/4 on the interval (—7/2, 2/2) to the value —7/4 on 
the interval (7/2, 37/2). 

Despite such examples, Fourier tended to assume that functions defined by 
trigonometric series are continuous. Dirichlet (1837) was the first to insist that 
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Fig. 1.15 Daniel Bernoulli and Joseph Fourier 


arbitrary functions really can have arbitrary values, and any general argument about 
functions should cover discontinuous functions. As an example, he introduced the 
function we call the Dirichlet function, 


1 if x is rational 
Axy=4 00 2, 
O if x is irrational. 


In fact, by the end of the nineteenth century, the Dirichlet function did not seem 
especially pathological. It is a limit of limits of continuous functions, and can be 
expressed by the formula 


d(x) = lim lim (cos(m!zx))" 


of Pringsheim (1899), p. 7. 

Thus, the seventeenth-century idea of representing functions by formulas extends 
much further than was first thought. “Formulas” may not be available for all 
functions, but they extend far beyond the continuous functions—certainly to all 
functions obtainable from continuous functions by taking limits. These functions 
are called the Baire functions after René Baire, who first studied them in 1898. 
Baire functions, and their close relatives the Borel sets, will be discussed in Chap. 8. 

The need to clarify the concept of continuity arose, as mentioned in Sect. 1.6, 
from attempts to prove the fundamental theorem of algebra, particularly the one 
by Gauss (1816). It should be added that not only did the solution come from 
outside algebra, via the intermediate value theorem of Bolzano (1817), so too did 
the problem itself. Originally, the motivation for a fundamental theorem of algebra 
was to integrate rational functions. The method of partial fractions makes it possible 
to integrate any quotient p(x)/g(x) of polynomials, provided we can split q(x) into 
real linear and quadratic factors. Such a factorization follows from the fundamental 
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Fig. 1.16 Carl Friedrich Gauss and Bernard Bolzano 


theorem of algebra. The novel contribution of Gauss was to see that one should not 
attempt to find formulas for the roots of polynomial equations, but rather to deduce 
the existence of roots from general properties of continuous functions. 

Thus, a problem about the most concrete kind of formulas, polynomials, was 
eventually solved by abstract reasoning about the general class of continuous 
functions. And, as Bolzano discovered, reasoning about continuous functions 
depends in turn on an abstract property of real numbers, the least upper bound 
property. It was to establish this property that Dedekind proposed his definition of 
the real numbers, which draws its inspiration from the ancient theory of proportions. 
Dedekind’s remarkable fusion of ancient and modern ideas will be developed in the 
next chapter. 


Chapter 2 
From Discrete to Continuous 


PREVIEW 


The questions raised in the introductory chapter stem from a single problem: bridg- 
ing the gap between the discrete and the continuous. Discreteness is exemplified 
by the positive integers 1,2,3,4,..., which arise from counting but also admit 
addition and multiplication. Continuity is exemplified by the concept of distance on 
a line, which arises from measurement but also admits addition and multiplication. 
The problem is to find a concept of real number that embraces both counting and 
measurement, and satisfies the expected laws of addition and multiplication. 

We begin by laying the simplest possible foundation for arithmetic on the positive 
integers, the principle of induction, which expresses the idea that every positive 
integer can be reached by starting at 1 and repeatedly adding 1. The corresponding 
method of proof by induction then allows us to prove the basic laws of arithmetic. 
From here it is only a short step to the arithmetic of positive rational numbers—the 
ratios m/n of positive integers m and n. 

With the laws of arithmetic established on the foundation of induction we can 
concentrate on constructing the real numbers by filling the gaps in the rationals. This 
is the step that completes the transition from discrete to continuous, and to carry it 
out we need an infinite process of some kind. The Dedekind cut process is the one 
favored in this book, since it extends the laws of arithmetic from rational to real 
numbers in a natural way. However, we also discuss some other infinite processes 
commonly used to describe real numbers, such as infinite decimals and continued 
fractions. 


2.1 Counting and Induction 


The origins of mathematics are lost in prehistory, but it seems reasonable to suppose 
that mathematics began with counting, so the first mathematical objects encountered 
were the positive integers 1,2,3,4,5,.... These objects, generated by the seemingly 
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simple process of starting with 1 and then going from each number to its successor, 
are not only infinitely numerous, but also infinitely rich in beauty and complexity. 
Evidence from Mesopotamia (Sect. 1.4) suggests that it was known as early as 
1800 BCE that positive integers have remarkable properties involving addition and 
multiplication, and by 300 BCE systematic proofs of such properties were given in 
Euclid’s Elements. 

Among the methods of proof in Euclid, one sees an early form of what we now 
call induction. Induction reflects the fact that each positive integer n can be reached 
from 1 by repeatedly adding 1. So if we start with the number n we can take only 
a finite number of downward steps. We have seen one such “descent” argument 
already: the proof that V2 is irrational given in Sect. 1.2. Euclid uses the “descent” 
form of induction to prove two important results. 


1. Each integer n > 1 has a prime divisor. 
Because if n is not itself prime, it factorizes into smaller positive integers, 
m, and n,, to which the same argument applies. Since each step of the 
process produces smaller numbers, it must terminate—necessarily on a prime 
divisor of n. 
2. The Euclidean algorithm terminates on any pair of positive integers a, b. 
The Euclidean algorithm, as Euclid himself described it, “repeatedly subtracts 
the lesser number from the greater.” That is, we start with the pair (a;,b,) = 
(a, b), for which we can assume a > b, and successively form the pairs 


(a2, by) = (max(b,, a; — by), min(b;, a; — b;)), 


(a3, b3) = (max(b2, dz — bz), min(b2, dz — b2)),... 


until we get ad, = by. Since the algorithm produces a decreasing sequence of 
numbers, termination necessarily occurs. 


These two results are foundation stones of number theory. The first shows the 
existence of prime factorization for any integer > 1, and the second shows (after 
some other steps we omit here) the uniqueness of prime factorization. So induction 
is evidently a fundamental principle of proof in number theory. 

Today, we know that virtually all of number theory can be encapsulated by a 
small set of axioms—the Peano axioms—that state the basic properties of the 
successor function, definitions of addition and multiplication, and induction. In fact, 
induction is implicit in the definitions of addition and multiplication, and in Sect. 2.2 
we will see how their properties unfold when induction is applied. However, this was 
quite a late development in the history of mathematics. It took a long time for the 
idea of induction to evolve from the “descent” form used by Euclid to the “ascent” 
form present in the Peano axioms. 
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Exercises 


One of the oldest problems about numbers that is still not completely understood is the problem of 
Egyptian fractions. In ancient Egypt, fractions were always expressed as sums of reciprocals, for 
example 


It is not obvious that any fraction between 0 and | can be expressed in this form and, indeed, the 
Egyptians probably did not know this for a fact. But it turns out that the following naive method 
always works: given a fraction a/b between 0 and 1, subtract from a/b the largest reciprocal 1/c 
that is less than a/b, then repeat the process with the fraction a/b — 1/c = a'/bc. This method, 
which was used by Fibonacci (1202), always works because a’ < a. So the process terminates in a 
finite number of steps (and it expresses m/n as a sum of distinct reciprocals). 


2.1.1 Use the Fibonacci method to express 3/7 as a sum of distinct reciprocals. 
2.1.2. If 0 < a/b < 1 and 1/c is the largest reciprocal less than a/b, show that ac > b > a(c — 1). 
2.1.3. Deduce from Exercise 2.1.2 that 0 < a’ < a, where 


a 1 ac—b a 


boc be be 


Thus it is always possible to express a fraction as a sum of reciprocals, as the Egyptians wanted. 
However, not much is known about the number of reciprocals required, or how large their 
denominators may become. 

The book (Fibonacci 1202) is also the source of the sequence of Fibonacci numbers, 


0,1, 1,2, 3,5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987,..., 


in which each number (after the first two) is the sum of the preceding two. 


2.1.4 Show that the Euclidean algorithm, applied to the pair of Fibonacci numbers (13, 8), 
terminates at the pair (1, 1). 

2.1.5 Explain why the Euclidean algorithm, applied to any pair of consecutive Fibonacci numbers, 
terminates at the pair (1, 1). 

2.1.6 Deduce that the greatest common divisor of any pair of consecutive Fibonacci numbers is 1. 


Another descent process that necessarily terminates is that of subtracting from n the largest power 
of 2 less than or equal to n. 


2.1.7. Use subtraction of the largest power of 2 to prove that each positive integer n can be 
expressed uniquely as a sum of distinct powers of 2. 
2.1.8 What does Question 2.1.7 have to do with binary notation? 


2.2 Induction and Arithmetic 


Although induction has been present in mathematics at least since the time of 
Euclid, the realization that it underlies even “trivial” properties of addition and 
multiplication is quite recent. Even more remarkable, this discovery was published 
in a book intended for high school students, the Lehrbuch der Arithmetik (Textbook 
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of Arithmetic) of Hermann Grassmann, in 1861. Grassmann’s “new math” was 
far ahead of its time and it went unnoticed, even by mathematicians, until it was 
rediscovered by Dedekind (1888) and Peano (1889). It is still surprising to see how 
tightly induction is bound up with the properties of addition and multiplication that 
seem obvious from a visual point of view, such asa +b =b+aand ab = ba. 

We follow Grassmann by using the natural numbers 0, 1,2,3,..., rather than the 
positive integers 1,2,3,..., as it is more convenient to start with 0 when defining 
addition and multiplication. We also use S(n), rather than n + 1, to denote the 
successor of n, in order to avoid any suspicion of circularity in defining addition. 
Thus we initially denote the natural numbers by 0,5 (0), SS(0), SSS (0), ..., and the 
successor function S$ is the only function we know. 

A proof by induction that property P for all natural numbers n proceeds by 
proving the base step, that P holds for 0, and the induction step, that if P holds 
for n = k then P holds for n = S(k). 


2.2.1 Addition 


On this slender foundation we now build the addition function + by the following 
inductive definition: 


m+O0=m, 


m+S(n)=S(m+n). 


The first line defines m + O for all natural numbers m, while the second defines 
m+S (n) for all m, assuming that m+n has already been defined for all m. It follows, 
by induction, that m + n is defined for natural numbers m and n. The first thing to 
notice about this definition of addition is that it implies n + 1 = S(n), as it should, 
because | is defined to be S(O) and so 


n+1l=n+S(0) 
= S(n+0) by definition of addition, 
= S(n) because n + 0 = n by definition of addition. 


Next, S(m) = 1 +n, by the following induction on n. For n = 0 we have 


S(0)=1=1+0, by definition of addition. 
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And assuming that S(m) = 1 + n holds for n = k we have 


1+S(k)=S(1 +4) by definition of addition, 
= SS(k) by induction hypothesis, 
= S(k)+1 because S(n) =n +1. 


So S(n) =n +1 = 1 +7 for all natural numbers n, by induction. 
Now we can use the definition to prove the algebraic properties of addition. 
A good illustration is the associative law, 


l+(m+n)=(l+m)+n for all natural numbers J, m,n. 


We prove this for all / and m by induction on n. First, the base step n = 0. In this 
case, the left side is /+ (m+0), which equals /+m by definition of m+0. And the right 
side (1 +m) + 0 also equals / + m, for the same reason. Thus /+ (m+n) = (/+m)+n 
holds for all / and m when n = 0. 

Now for the induction step: we suppose that / + (m+n) = (1+ m) +7 holds for 
n= k, and prove that it also holds for n = S(k). Well, 


1+(m+S(k))=1+S(m+k) by definition of addition, 
= S(1+(m+k)) by definition of addition, 
= S((l+m)+k) by induction hypothesis, 
= (1+ m)+S(k) by definition of addition. 


This completes the induction. 

With the help of the associative law we can prove other algebraic properties of 
addition, such as the commutative law, m+n =n+m. The steps are outlined in the 
exercises below. 


2.2.2 Multiplication 


Now that we have the addition function, we can define multiplication, written m-n 
or simply mn, by the following induction: 


m:-0=0, 
m:-S(n)=m-n+m. 


The usual algebraic properties of multiplication follow from this definition. For 
example, we can prove the identity property, that | -m = m by induction on m 
as follows. 
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The base step 1 - 0 = 0 follows from the definition of multiplication. For the 
inductive step, we suppose that | -m = m holds for m = k. Then 


1-S(K)=1-k4+1 by definition of multiplication, 
=k+1 by induction hypothesis, 
= S(k) by the proof above that k + 1 = S(k). 


This completes the induction, so | - m = m for all natural numbers m. 


2.2.3 The Law ab = ba Revisited 


Other familiar algebraic properties of multiplication follow by induction, as outlined 
in the exercises below. The main difficulty in these proofs is the absence of algebraic 
assumptions in the definitions of addition and multiplication. So all algebraic 
properties must be proved from scratch, and it is not obvious which ones to prove 
first. It turns out, for example, that one needs to prove associativity before proving 
commutativity. 

This may be surprising, because we saw in Sect. 1.1 that ab = ba is obvious 
when we view numbers as lengths and take the product of lengths a and b to be 
their “rectangle.” But in Sect. 1.2 we saw that the concept of “length” is not as 
simple as it looks. In particular, it is not clear how to represent irrational lengths by 
numbers. If we wish to put the number concept on a sound foundation, we should 
presumably begin with the natural numbers, where Grassmann’s approach allows 
us to prove ab = ba without appeal to geometric intuition. Hopefully, we can then 
extend the number concept far enough to capture the concept of irrational length, 
while at the same time extending the proof that ab = ba without appeal to geometric 
interpretations of a, b, and ab. 

In the remainder of this chapter we explain how this program may be carried 
out. The number concept is extended in two stages; from natural numbers to 
rational numbers, and from rational numbers to real numbers. The first stage is 
fairly straightforward, and purely algebraic. The second stage is where we make 
the leap from the discrete to continuous, avoiding the use of geometric concepts by 
introducing a concept that is more universal: the set concept. 


Exercises 


Our first goal is to prove the commutativity of addition, m + n = n+ m, by induction on n. This 
depends on some of the properties of addition already proved above. 


2.2.1 Show that the base step, for n = 0, depends on proving that 0 + m = m, and prove this by 
induction on m. 
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2.2.2. The induction step is that if m+ k = k+m then m+ S$(k) = S(k) +m. Prove this implication, 
using the results m+ 1 = 1 +m, S(k) = k + 1 and associativity proved above. 


Next, in order to deal with combinations of addition and multiplication, it will be useful to have 
the left distributive law, I(m + n) = Im + In, and the right distributive law, (1 + m)n = In + mn. 


2.2.3 Prove the left distributive law by induction on n, using the definition of multiplication and 
the associativity of addition. 

2.2.4 Prove the associative law of multiplication, (mn) = (lm)n, by induction on n. For the 
induction step use the definition of multiplication and the left distributive law. 

2.2.5 Prove the right distributive law by induction on n, using the definition of multiplication and 
the associative and commutative laws for addition. 


Finally we are ready to prove the commutative law of multiplication, mn = nm by induction 
on n. (One wonders why this result is so hard to reach by induction, when it seemed so easy in 
Sect. 1.1. Apparently, mn = nm can be true for reasons quite different from those that first come to 
mind. See also Sect. 2.9.) 


2.2.6 Show that the base step, n = 0, follows from 0 - m = 0, and prove the latter by induction 
on m. 

2.2.7 Show that the induction step, mk = km implies m-S (k) = S(k)-m, follows from the definition 
of multiplication, identity property, and right distributive law. 


2.3 From Rational to Real Numbers 


The arithmetic of natural numbers in the previous section is easily extended to the 
arithmetic of non-negative rational numbers m/n, where m and n are any natural 
numbers with n # 0. Admittedly, the definition of sum for rational numbers 


a7 ad + bc 
bod bd 


is quite sophisticated, and it causes a lot of grief in elementary school. But once 
this concept is mastered, it is not hard to see that the commutative, associative laws, 
and so on, extend from the natural numbers to the rational numbers m/n. Moreover, 
the rational numbers have a convenient property that the natural numbers lack: each 
rational number r # 0 has a multiplicative inverse r~' such that rr~! = 1. Namely, 
ifr = m/nthenr7! = n/m. 

Thus there is not much difference between the arithmetic of natural numbers and 
that of the rational numbers. To extend arithmetic to irrational numbers we need to 
understand where the irrational numbers lie relative to the set of rational numbers, 
and we really have to start thinking of the rational numbers as a set, which we denote 
by Q. (The symbol Q apparently stands for “quotients.’”) 

In Sects. 1.2 and 1.3 we observed the difficulties raised by the existence of 
irrational numbers, such as V2, for geometry and arithmetic. On the one hand, we 
want enough numbers to fill the line. On the other hand, we want to be able to add 
and multiply the points on the line in a way that is consistent with addition and 
multiplication of rational numbers. 
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In Sect. 1.3 we floated the idea of using infinite decimals to represent irrational 
numbers, but immediately cast doubts on its practicality, due to the difficulty of 
adding and multiplying infinite decimals. This makes it hard to tell whether their 
arithmetic is even compatible with the arithmetic of rational numbers. For example, 
how would you like to verify that 


using the infinite decimals 


= 0.166666666666... 


STR Ale 


= 0.142857142857... ? 


Nevertheless, infinite decimals do solve the problem of representing all points 
of the line, so it is worth exploring them a little further. We will see that infinite 
decimals can serve as a stepping-stone to a concept of real numbers that is not only 
faithful to the image of points on a line, but also compatible with the arithmetic 
of rational numbers. Moreover, the new real number concept comfortably includes 
numbers such as V2, and allows us to prove results as v2x V3 = V6. 

First, let us revisit the infinite decimal for -V2, as it will help to explain why there 
are enough infinite decimals to fill a line. The infinite decimal 


V2 = 1.414213... 


is what we call the least upper bound (lub) of the following set of finite decimal 
fractions: 


1 

14 

1.41 
1.414 
1.4142 
1.41421 
1.414213 


It is an upper bound because it is greater than each of them, and it is east because 
any number less than 1.414213... must be less in some decimal place, and hence 
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less than some member of the set. Thus any number less than 1.414213 ...is not an 
upper bound at all. 
The number 1.414213. ..is also the greatest lower bound (glb) of the set 


2 

1.5 

1.42 
1.415 
1.4143 
1.41422 
1.414214 


—by a similar argument. Thus the irrational number V2 fills a “point-sized hole,” 
or gap, between the two sets of finite decimals 


1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, 
and 
.., 1.414214, 1.41422, 1.4143, 1.415, 1.42, 1.5, 2. 


There are of course many holes in the set of finite decimal fractions (a simpler 
example is 1/3), but each of them is a “point-sized hole” for the same reason that 
2 is: we can approach it arbitrarily closely, from below or above, by finite decimal 
fractions. Thus the infinite decimals complete the number line by filling all the gaps. 

Looking back over this explanation, one sees that there is nothing special about 
the set of finite decimal fractions. We could use any fractions that are plentiful 
enough to approach each point arbitrarily closely. For example, the binary fractions 
m/2" (for integers m,n) suffice. At the other extreme, one could use all the rational 
numbers m/n. This gives the advantage of a simple approach to addition and 
multiplication, as we will see in the next section. So let us see what gaps look 
like in the set of rational numbers. It is convenient to consider just the set Q* of 
non-negative rationals for now. 

A gap occurs wherever the set Q* breaks into two sets, L and U, with the 
following properties. 


1. Each member of L is less than every member of U. 
2. L has no greatest member and U has no least member. 
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Under these circumstances, the gap between L and U represents a single irrational 
number, equal to lub L and glb U. An example is where 


U ={reQ':r’ >2}, 


in which case the gap represents the number v2 = glb U. 

A separation of Q* into two sets L and U with the above properties is called a 
Dedekind cut after Richard Dedekind, who first thought of representing irrational 
numbers in this way. His cuts occur exactly where the gaps are. You could even say 
that the cut is the gap, so we are filling the gaps simply by recognizing the gaps as 
new mathematical objects! 

This is not a joke, but actually a deep idea. Since L and U completely 
determine an irrational number, we can take this pair of sets to be an irrational 
number. We create a new mathematical object by comprehending a collection of 
existing mathematical objects as a set. Dedekind was the first to notice the power 
of set comprehension, and in doing so he launched the program of arithmeti- 
zation—building all of mathematics on the foundation of natural numbers and 
sets. 

Since either L or U completely determines the cut, it suffices to represent the 
corresponding irrational number simply by L. This is convenient because then the 
ordering of real numbers corresponds to set containment. That is, 


lubL<lubl’ oe LCL’. 


Pursuing this idea a little further, we note that each rational number s is the lub 
of a set L of rational numbers, namely 


L={reQ:r<s}, 


and L has the following properties of the lower part of a Dedekind cut: 


1’. Lis a bounded set of positive rationals with no greatest member. 
2’. Lis “closed downward,” that is, if p € L and 0 < q < p, theng € L. 


Thus, if we define a lower Dedekind cut to be any set L with properties 1’ and 2’, 
then every positive real number, rational or irrational, can be represented by a lower 
Dedekind cut. We thereby obtain a uniform representation of positive real numbers 
as certain sets of rational numbers, and set containment gives the usual ordering of 
numbers. In the next section we will see how lower Dedekind cuts may also be used 
to define sums, products, and square roots of positive numbers. 
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Fig. 2.1 The lower Dedekind cuts for 1 and 2/3 


Fig. 2.2. The lower Dedekind cut for v2 


2.3.1 Visualizing Dedekind Cuts 


To visualize Dedekind cuts we first “spread out” the rationals as we did in Sect. 1.3. 
That is, we view n/m as the integer point (m, ) in the plane (the point at slope n/m 
from (0, 0)). Then the lower Dedekind cut corresponding to a real number r is seen 
as the set of integer points below the line through (0,0) with slope r. Figure 2.1 
shows the lower Dedekind cuts for | and 2/3 in this fashion. 

Figure 2.2 shows the cut for V2. Since points are shown as small disks, “points” 
may appear to fall on the line though they are actually below it. [It is a good exercise 
to check this for some points, such as (7, 5).] 


Exercises 


As we remarked earlier, an advantage of decimals is that they instantly tell us which is the larger 
of two numbers; we have only to look at the first decimal place (from the left) where the two 
decimals differ. This advantage extends to describing the least upper bound of a set of numbers 
given by infinite decimals. 


2.3.1 Suppose that S is a set of real numbers between 0 and 1, and / is the least upper bound of 
S.. How is the first decimal place of / determined by the first decimal places of the members 
of S? 

2.3.2 How is the second decimal place of / determined? 

2.3.3 Using Questions 2.3.1 and 2.3.2, give a description of the decimal expansion of /. 

2.3.4 Does a similar idea work to find the glb of S? 
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Thanks to the representation of positive real numbers by lower Dedekind cuts, 
all properties of real numbers reduce to properties of rational numbers, which 
we already understand. For example, we can explain the addition of positive real 
numbers in terms of addition of rational numbers by means of the following 
definition. 


Definition. If L and L’ are lower Dedekind cuts, then L + L’ is defined by 
L+UL'={r+r:réLandr €L}. 


In other words, the sum of two lower cuts is the set of sums of their respective 
members. Notice that we immediately have L+ L’ = L’+L, because r+r’ = r’ +r for 
rational numbers. Other algebraic properties of cuts are similarly “inherited” from 
those of rational numbers. Admittedly, we have to prove that the sets obtained in this 
way are themselves lower Dedekind cuts. This turns out to be fairly straightforward 
and, as promised, it depends only on properties of rational numbers. 


Sum theorem for Dedekind cuts. [f L and L’ are lower Dedekind cuts, then so is 
L+L’, and it agrees with the ordinary sum on the rational numbers. 


Proof. The sum: L + L’ = {r+7’: re Land?’ € L’} is certainly a set of rational 
numbers, bounded above by the sum of bounds on L and L’. Now suppose that 
r+r €L+L’ and that tis a rational number less than r + r’. We have to show that t 
is also in L + L’, in other words that t = s+ s’, forsome s € Land s’ € L’. One way 
to do this is to divide ¢t into two pieces in the ratio of r to 7’: that is, let 


/ 


tr tr 


A : 
r+r r+r 


Then s is rational and s < r, so s € L; similarly s’ € L’; and clearly 5 + s’ = t. 
In the special case where L and L’ represent rational numbers, / and I’ say, 


L={reQt:r<}, V={reU:r <I}. 


Then, for any re Land?’ € L’ we haver+r’ <1+I/’. Conversely, for any t</+/ 


we have t=r+r’ withr <landr’ <I’; namely, let r = wot andr’ = 7 t. oO 


We similarly would say that the product of two lower cuts is the set of products 
of their members. 


Definition. If L and L’ are lower Dedekind cuts, then LL’ is defined by 
LL’ ={rr':reLandr’ €L’}. 


Product of Dedekind cuts. /f L and L’ are lower Dedekind cuts, then so is LL’, 
and it agrees with the ordinary product on the rational numbers. 
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Proof. The product LL’ = {rr’ : r € Land?’ € L’} is a set of rationals, bounded 
above by the product of the bounds on L and L’. Now suppose that rr’ € LL’ and 
that ¢ is a rational less than rr’. To show that f is also in LL’ we have to find s € L 
and s’ € L’ such that t = ss’. Since rs < 1, there is a rational g such that 


t 
—<q<il, 
rr’ 


so we can take 
s=rq, whichis less than r because q < 1, 


t : : t 
s’ = —, whichis less than r’ because — < q, 
rq rr’ 


and this obviously gives ft = ss’. 
In the special case where L and L’ represent rational numbers, / and I’ say, 


L={reQi:r<}, L={reQVir<I}. 


Then, for any r € Land?’ € L’ we haverr’ < Il’. Conversely, for any t < Il’ we 
have t= rr’ with r </ andr’ < I’. Namely, choose a rational g with 


= <q<l, 
and let 
r=lq, whichis less than / because q < 1, 
r= 7 which is less than /’ because = <q. 
This gives t = rr’ as required. oO 


2.4.1 The Square Root of 2 


We have already seen one valid way to describe V2: by an infinite decimal. In 
Sect. 2.7 we will see another way, by a continued fraction. But neither fits easily into 
an arithmetic theory of real numbers, because it is hard to describe multiplication 
of infinite decimals (and even harder for continued fractions). Dedekind cuts, on 
the other hand, are easily multiplied, and this leads to a relatively easy treatment of 
square roots. We show how in the case of V2. 
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Existence of V2. The lower Dedekind cut K = {seQ’: se <2} represents v2, 
because 


K*={reQ’:r<2}, 


which is the lower Dedekind cut representing 2. 


Proof. By the definition of product of lower Dedekind cuts, 
K? = {ss' : 5,8’ € K} ={ss’ : s?, 8’? <2}. 


We know, from the product theorem above, that K2 is a lower Dedekind cut, so it 
suffices to prove that lub(K?) = 2. 

Well, if s?, s’? < 2, then ss’ < one of s*, s’” < 2, so lub(K”) is at most 2. To show 
that lub(K?) is at least 2 it suffices to show the following: for each rational r < 2 
there is a rational s with r < s* < 2 (because in that case s € K and s? € K”). 

Such an s? can always be found! by choosing s € K sufficiently close to lub(K), 
which is possible because there are rational numbers arbitrarily close to lub(K). O 


2.4.2 The Equation V2 V3 = V6 


Dedekind (1872) wrote (in the 1901 English translation, p. 22) 


Just as addition is defined, so can the other operations of the so-called elementary arithmetic 
be defined ... differences, products, quotients, powers, roots, logarithms, and in this way 
we arrive at proofs of theorems (as, e.g., V2. V3 = V6, which to the best of my knowledge 
have never been established before). 


However, Dedekind did not go beyond defining the sum of cuts, so the proof that 
V2 V3 = V6 is not in his book either. Now that we have defined the product of cuts, 
and found the cut for V2, we are very close to such a proof. It remains to define V3, 
by the cut 


K’ ={s€Q’: s* <3}, 


'A specific way to do this is to choose rational s and f that are close together on either side of 
lub(K). For example, find s ¢ K,t ¢ K witht <2 andt-—s< aor Then we have 


2 2= 
P-s =(t+ silt s)<4- =2-r7, 


and hence r < s2 <2 <??. 
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and to prove that 
KP =L' ={reQ*:r<3} 


(the cut representing 3). This is very similar to the proof K* = L above, and we 
leave it as an exercise. 
Then V2 V3 is represented by the cut KK’, with the square 


(KRY SRR SLE, 


which represents 2 - 3 = 6. So KK’, which represents V2 V3, also represents V6. 


Exercises 


The following two exercises verify that 3 is the square of the lower Dedekind cut K’ = {s € Qt: 
s? < 3}, as assumed in the proof above. 


2.4.1 Show that, for each rational r < 3, there is a rational s with r < s? < 3. (Hint: Consider 


sé K’ andt¢ K’ witht <2 andt—s< 4.) 


2.4.2 Deduce that K” = {r € Q* : r < 3}, so K’ represents V3. 


By generalizing this argument we can show the existence of roots of all real numbers, and other 
algebraic properties. 


2.4.3 Explain why each positive real number has a square root. 

2.4.4 Explain why each positive real number has a cube root. 

2.4.5 Show that properties such as ab = ba (used in the proof that V2 V3 = V6) are inherited 
from the rational numbers. 


2.5 Order and Algebraic Properties 


Now we come back to the question asked in Sect. 1.3: what are points, and how do 
they fill the line? For simplicity we will consider just the positive real numbers that 
we have defined so far. These are the “points” (members) of a set R* we will call the 
positive number line. (It is no secret that we are going to introduce negative numbers 
shortly, so as to obtain the full number line R.) 

The properties of R* that make it a “line” are called order properties and they 
can be expressed in terms of the < relation (which corresponds to the containment 
relation C between lower Dedekind cuts). The following properties of < define what 
we call a linear order. For any a,b,c € R: 


asa, 
either a < borb<a, 


ifa<bandb<cthena<c. 
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All of these properties are obvious for positive numbers because < means C. 
However, a linearly ordered set of points is not necessarily what we would call a 
“line.” 

The natural numbers are linearly ordered, but they are isolated in the sense that 
for each natural number n there is an empty space before the next natural number 
n+ 1, so the natural numbers come nowhere close to filling the line. The rational 
numbers come closer, because they lie densely on the line. That is, between any two 
rational numbers there is another. But, as we know, even this dense set has gaps. 
In particular, there is a gap in the rational numbers at the position V2. What this 
means, precisely, is that the set {r € Q* : r? < 2} has no least upper bound in Q*, 
because the rationals r with 7? > 2 have no least member. 

R* qualifies as a “line” because it is dense and has no gaps, because any bounded 
set of real numbers has a least upper bound. 

This property follows from the fact that real numbers are (or are represented by) 
certain subsets of Q*, namely, lower Dedekind cuts. A bounded set of real numbers 
x is therefore a set of lower Dedekind cuts L, that all lie below some bound r. But 
then the union L of all the cuts L, is itself a lower Dedekind cut, and it is obviously 
the /east cut that contains (which means >, remember) all the cuts L,. Thus the 
union L of the cuts L, is their least upper bound.” 

The least upper bound property has important consequences in analysis, such as 
the intermediate value theorem and the integrability of continuous functions. We 
will study these results in Chap.4. They depend on the fact that R has no gaps, 
which we call the completeness of R. 

The final definitive property of the order of R is called the Archimedean property. 
It says that, if a and b are numbers with 0 < a < b, then na > b for some natural 
number n. The property holds because 


in the lower Dedekind cut for a there is a rational p/gq < a, 


in the upper Dedekind cut for b there is a rational r/s > b, 


and we can certainly find a natural number 7 such that 


np/q>r/s. 
For example, 1 = rq will do. 


The Archimedean property implies that there are no infinitesimal numbers; that 
is, numbers a > O with a < 1/n for all natural numbers n. If a is infinitesimal, then 
na < 1 for all natural numbers n, which contradicts the Archimedean property with 
b=1. 


To summarize: the order of the positive real numbers is linear, dense, 
Archimedean, and complete. 


7In particular, when the bounded set consists of the cuts L, where r? < 2, its least upper bound L 
is the cut representing V2. 
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2.5.1 Algebraic Properties of IR 


Until now we have worked only with non-negative numbers, so as to take the 
shortest route to Dedekind cuts and their multiplication. Looking back at the route 
taken, it is easy to see how negative numbers can be carried along as well. 

We began with the natural numbers, 0, 1,2,3,..., showing that they have the 
following algebraic properties (all provable by induction): 


at+b=b+a ab=ba (commutative laws) 
at+(b+c)=(a+b)+c abc) = (ab)c (associative laws) 
at+O=a a-:l=a (identity laws) 

a(b +c) =ab+ac (distributive law) 


When we adjoin the negative integer —m for each positive m the above laws are 
preserved if we let (—m)n = —mn, and we gain the additive inverse law: 


a+(-a)=0. 


These eight algebraic laws define what is called a ring (or more precisely, a 
commutative ring with unit). We call the natural numbers and their negatives the 
integers, and denote the ring of integers by Z. The symbol comes from the German 
word “Zahlen” for “numbers.” 

The quotients m/n of integers m,n with n # O form the set Q of (positive 
and negative) rational numbers, and they satisfy all the laws of a ring, plus the 
multiplicative inverse law: 


a-a'=1 for a#0, 


where a7! = n/mif a = m/n. A structure satisfying these nine laws is called a field. 

Our final objective is to introduce negative real numbers so that the resulting set 
R of real numbers is a field. If we continue to define positive real numbers r as 
subsets of the set Q* of positive rationals, we need to adjoin 0 and a negative real 
—a for each positive a. Then the rule (-a)b = —ab ensures that all the field laws 
continue to hold. 

Now let us observe how the order of R interacts with sums and products. Clearly, 
we have 


0<1, 
ifa<bthena+c<b+tce, 


ifa < bandc > Othenac < be. 


42 2 From Discrete to Continuous 


Given that < is a linear order, as defined above, a field with the latter three properties 
is called an ordered field. Since the order of real numbers is complete we have: R is 
a complete ordered field. 

It is not necessary to add that R is Archimedean, because this property actually 
follows from completeness in an ordered field. 


Archimedean property of a complete ordered field. [f F is a complete ordered 
field and a,b € F with 0 <a < b, then na > b for some natural number n. 


Proof. Suppose, on the contrary, that na < b for each natural number n. Then the 


set {a, 2a, 3a, .. .} is bounded and hence has a least upper bound c, by completeness. 
Since c — a < c it follows that c — a < na for some natural number n. But then 


c < (n+ 1)a contrary to the definition of c. oO 
Exercises 
When one first meets infinite decimals, it seems hard to believe that 0.9999... = 1, because it 


seems that there should be an “infinitesimal” difference between | and 0.9999... 


2.5.1 Show, on the contrary, how this is a good illustration of the Archimedean property of real 
numbers. 


We saw an example of a non-Archimedean linearly ordered set in the exercises to Sect. 1.5, 
namely the set R of rational functions with real coefficients, in which the constant functions 
represent the real numbers and the function «(x) = 1/x is an infinitesimal. 


2.5.2 Show that R is not complete, because the set of infinitesimal functions is bounded but has 
no least upper bound. 


R is in fact the only complete ordered field (“up to isomorphism”’), as the following exercises 
show. Suppose F is such a field. 


2.5.3, Deduce from the properties 0 < 1 anda+c<b+c whena < b that F contains elements 
6 -2<-1<0<1<2<:--:-, 


and hence is a copy of the integers. 

2.5.4 Deduce, by forming quotients, that F contains a copy of the rationals. 

2.5.5 For each x € F, consider the lower Dedekind cut L, consisting of the rationals in F' that 
are < x. Deduce from completeness that lub L, = x, so the elements of F are in bijective 
correspondence with the real numbers. 


2.6 Other Completeness Properties 


Absence of gaps may be the most intuitive way to think of completeness, but often 
it is better to think of completeness as a guarantee that certain infinite processes lead 
to a result. Forming an infinite decimal is one such example. We now discuss two 
others, which often arise in analysis. 
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The first concerns closed intervals, the sets of the form 
[a,b]={xEeR:a<x<b}, 


and the second (which follows from the first) concerns convergence of sequences. 
We will show only that these two results follow from the lub (and glb) property; 
however, it can also be shown that they imply it. 


Nested Interval Property. [f 1; 2 I, 2 I; D2 --- are closed intervals with lengths 
that become arbitrarily small, then I), Ih, 1s,... have a single common point. 


Proof. Let Tl, = (a1, 61], Lo = [a2, bo], ..., so we have 


a<a<a<---<b <n <b. 


It follows, by completeness, that lub{a1, a2, a3, ...} and glb{b1, bo, b3, .. .} both exist. 
We therefore have 
a, <a) < a3 <--- < lub{a),a,q,...} 
< glb{b;, bz, b3,...} < +++ <b3 <bo < by. 
Thus any x in the interval from lub{a,, a2, a3, ...} to glb{b1, b2, b3, .. .} is common to 


all of I|,b,.... 
If the length of the intervals /,, 5, ... becomes arbitrarily small, then 


X= lub{ay, ao, a3,.. +} = glb{b,, bo, bz, ees J 


is the only common point. oO 
We now use nested intervals to study limit points of sequences. 


Definition. A sequence of numbers c), C2, c3,... converges if it has a limit c, that 
is, if c, becomes arbitrarily close to c as n increases. More precisely, c is the limit 
of the sequence c1, C2, c3,... if, for each number € > 0, there is a natural number NV 
such that 


n>N => |c-c¢,|<e. 


We would like to be able to tell whether a sequence converges without knowing in 
advance what its limit is. The Cauchy convergence criterion makes this possible. 
Cauchy convergence criterion. Sequence c1, cz, C3,... converges if, for each e > 0, 
there is an N such that 


mn>N => |Cm—- Cyl < €. 
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Proof. If the sequence cj, c2,c3,... satisfies the Cauchy convergence criterion there 
is a sequence of natural numbers N; < Nz < N3 <--- such that 


mn>N, > |em— Cyl < 1/2, 
m,n>Ny => |em — Cyl < 1/4, 


m,n>wN3z3 > |cm— Cyl < 1/6, 


and so on. Now if |c,, — cn| < 1/2 for all m,n > N, this means in particular that 
all c, stay within distance 1/2 of cy,+1 for n > Ni, and hence within an interval of 
length 1. Similarly for Nz, N3,...; So we get nested closed intervals 


1, of length 1, with c, € J, for alln > N,, 
2 h, of length 1/2, with c, € J) for alln > No, 


2 |, of length 1/3, with c, € J; for alln > Nz, 


the length of which becomes arbitrarily small. By the nested interval property, there 
is a single point c common to these intervals, and c is clearly the limit of the 
sequence C1, C2, C3,.... Oo 


We see from this proof that the Cauchy convergence criterion guarantees a 
limit because of the nested interval property, and hence ultimately because of the 
completeness of R. Conversely, as mentioned above, if each sequence satisfying 
the Cauchy criterion has a limit then R is complete. Indeed the completeness of R 
is often expressed this way: every Cauchy sequence has a limit, where a Cauchy 
sequence is one satisfying the Cauchy convergence criterion. This may seem more 
longwinded than, say, the least upper bound property, but it is important for several 
reasons. 

One is that sequences of numbers are very common—we have already seen 
several examples—so we need to understand the concept of convergence. Moreover, 
numbers can very well be complex, and hence not ordered, so the concept of 
Dedekind cut may not apply. Another reason is that sequences of functions are also 
common, and we can use the Cauchy convergence criterion for functions, where the 
concept of Dedekind cut also does not generally apply. 


Exercises 


2.6.1 Define a sequence of nested open intervals with no common point. 


The nested interval property and the Cauchy convergence criterion are both nicely illustrated by 
infinite decimals. 
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2.6.2. Interpret the infinite decimal for V2 as the common point of a sequence of nested intervals. 
2.6.3 Show that the sequence 1, 1.4, 1.414, 1.4142, 1.41421, ..., which defines the infinite decimal 
for V2, satisfies the Cauchy convergence criterion. 


Consider nested sequences of intervals J; > I, > 1; 2 --- with lengths that converge to zero, where 
Tn = (an, by] and a, b, are rational. 


2.6.4 Show that each such sequence corresponds to a Dedekind cut. 

2.6.5 Show that two such sequences have the same common point if and only if they correspond 
to the same Dedekind cut. 

2.6.6 Deduce that if real numbers are defined by such sequences, then we get real numbers with 
the same properties as those defined by Dedekind cuts. 

2.6.7 Deduce in turn that real numbers can be defined by Cauchy sequences. 


2.7 Continued Fractions 


The Euclidean algorithm from Sect. 2.1, which operates on a pair (a, b) of positive 
integers, can also be viewed as a procedure for expressing each positive rational a/b 
as a continued fraction. Indeed, the continued fraction elegantly encodes the main 
steps in the algorithm. 

Here is an example. We operate on the numbers 19 and 7, by first encoding them 
in the fraction 19/7. 


19 5 
ae 2+ a subtracting 7 twice from 19, 
1 1 
= 2+ = = 24 subtracting 5 once from 7, 
7/5 , 2 
+ — 
5 
1 1 
=2+ - a =2+ a subtracting 2 twice from 5. 
1+ —— LL 
5/2 1 
24- 
7) 


The numbers 2, 1, 2, 2 occurring in the continued fraction for 19/7 record the number 
of times the smaller number can be subtracted from the larger at each stage. In other 
words, they record the quotient when the larger number is divided by the smaller. 

Given any number m/n, where m and n are positive integers, we can similarly 
show that 
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1 
— =n + 
1 
na + 
1 

1 

Ng-| + — 
Nk 

where 71,/2,...,/% are positive integers. A fraction of this form is called a 


finite continued fraction. The fraction terminates because the Euclidean algorithm 
terminates. 

To prove that V2 is irrational it therefore suffices to show that the Euclidean 
algorithm does not terminate on the pair of numbers V2, 1. Surprisingly, this is not 
hard to do. Here is what happens. The secret is to use the fact that ( v2+ 1)\( = 1)= 
1. As above, whenever we get a number less than | we rewrite it as 1/(number greater 
than 1), in order to continue the fraction. 


V2=1+ V2-1 subtracting 1 once from v2, 
1 
=i -.—— because (V2 + 1)(V2- 1) = 1, 
Vv2+1 
1 : ; 
= 1+ ——— subtracting | twice from V2 + 1, 
2+ v2-1 
1 
a because (V2 + 1)(V2- 1) = 1. 


a 
y2+1 


At this point it is clear that the Euclidean algorithm will not terminate, because the 
denominator V2 + 1 has occurred previously. 

It follows that Y2 is not equal to any ratio m/n of positive integers. So we 
have again proved that 2 is irrational—and this time without using proof by 
contradiction. 

Moreover, we now have a clearer view of the irrational number V2. It can be 
described by a simple repetitive process, the Euclidean algorithm on V2 and 1. And 
this gives a simple repetitive formula for V2, namely its infinite continued fraction: 


~ 1 
v2=1+ 
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If the fraction x makes sense (which it does, as we will prove rigorously below) then 
it is much more transparent than the infinite decimal for V2, because we can survey 
its totality: a | followed by infinitely many 2s. In this sense, it is as transparent as 
an ultimately periodic decimal, such as the decimal 0.166666... that represents 1/6. 
For more on ultimate periodicity in continued fractions, see the exercises below. 


Exercises 


The simplest infinite continued fraction represents the famous number 15 known as the golden 
ratio. The golden ratio is the ratio of the sides of the golden rectangle, shown in Fig. 2.3. Its 
defining property is that the rectangle obtained by cutting off a square has the same shape as the 
original. 

When a square is cut off as shown in Fig. 2.3, the width of the rectangle that remains is of 
course the greater side minus the lesser. So, by repeating the process of cutting off squares, we can 
implement the Euclidean algorithm. 

2.7.1 Prove, from the defining property of the golden rectangle, that the golden ratio equals 
1+ V5 
4 . 
2.7.2 Prove that the Euclidean algorithm does not terminate on the pair ( Lvs , 1) by considering 
the golden rectangle. 
2.7.3 Deduce that 15 is irrational. 


2.7.4 What is the continued fraction for 15 ? 


Periodic continued fractions, such as the one for =. can be evaluated by showing that they 


satisfy quadratic equations. Here is another example. Let 


x=3+ 


3+ 


2.7.5 Show that x satisfies the equation x = 3x + 1, and hence find x. 


Fig. 2.3. The golden rectangle 
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2.8 Convergence of Continued Fractions 


The nested interval property from Sect.2.6 may be used to clarify the nature of 
infinite continued fractions, which we introduced in the previous section without 
investigating exactly what they mean. We define the infinite continued fraction 


do + — for any positive integers do, a, d2,..., 


ait 
1 
az + — 


to be the limit of the sequence of finite continued fractions 


Co=a0, Cj =agt—, C2 =ag + ——, ...., 
a\ 1 
ayjt+— 
a2 


which are known as the convergents of the infinite continued fraction. We are going 
to show that this sequence does indeed converge, by capturing its limit in a nested 
sequence of closed intervals with a size that tends to zero. 

We let P,,, Q, be the relatively prime integers with the ratio c,; that is 


P 1 ae 
a: =ajt+ ; for positive integers ao, d1,..., Qn. 
n 


ay t+ 


In particular, Pp = ao, Qo = 1 and P; = aga; + 1, Q; = aq. Forn > 2 we will 
express P,, in terms of P,_; and P,_2, and Q, in terms of Q,-; and Q,-2, by simple 
recurrence relations which we prove by induction on n. 

These relations will enable us to inductively prove various properties of the 
fractions P,,/Q,, and thereby explain their convergence. 


Recurrence relations for the convergents. /f P,, and Q, are the relatively prime 
integers such that 


— = d) + —————_- for positive integers ag, a\,Q2,..., 
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then 


Po =a, P) =aoa, +1, and Py =ayPy-1 + Py-2 forn > 2; 


Qo=1, Q1=a1, and Qn =anQn-1 + Qn-2 forn > 2. 


Proof. The values of Po, P1, Qo, Qi are easy to check, and it is almost as easy to 

check the recurrence relations for n = 2 (exercise). Now suppose that the relations 

hold for n = m — 1, that is, for any sequence of m positive integers ag, a,...,Qm-1.- 

To prove them for all n, by induction, it suffices to show that they hold for n = m. 
To do this we first define relatively prime integers Pp, oF by 


— =a, + —————_- for 7 = 0,1,2,---. 


Aj+1 


Since the recurrences are supposed to hold for any sequence of m positive integers, 
including aj, d2,..., 4m, we have 


Pin = mP + Pro and Qn = AmQn—1 + Qn (*) 


Now the relation between the P/Q fractions and the P’/Q’ fractions is 


/ / - 
Pi I. QO; agP’, + OF 
AZ =a4+ = a0 


0; P’/0, 


or P’ ? 
j j 


and we notice (using the Euclidean algorithm) that 
ged(ayP’, + O’,, Ps) = god(P’, O') = 1, 
so 
P;=aoP,+ Qj, and Q;=P%. (**) 
Taking j = m in (**), then applying (*), gives 


Pm = aoP in + 0, 
= d(AmP iy) + Pin2) + Am Qn) + Qin-2 


= Am(a0P in-1 + Ot) oe aoP 2 - OF is 
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On = P., 


/ 
= AmP.,-4 


+ Pio (***) 
Also, taking 7 = m— 1 and j = m — 2 in (**) gives 


es ’ _p 
Pin = aoPn-1 ale 01-1 and Q,-) = P 


m-1 


/ , / 
Pin-2 = A0Py-9 +Qi,-0 «and =Om-2 = P.,,->- 


The latter equations allow us to replace all the primed terms in (***) and they 
become the required recurrence relations for n = m: 


Pim = GmPm-1 + Pm-2 and On = Gin Qm-1 + On-2- oO 


From the recurrence relations we quickly obtain some properties of the integers 
P,,, Q, that enable us to prove that the convergents P,,/Q, indeed converge. 


1. Since Qp = 1, Q) = aj, Qn = AnQn-1 + Qn-2, and ao, a), a2,... are positive 
integers, it follows by an easy induction that Q,, grows with n and hence Q, > n. 

2. Another induction (exercise) shows that P,Qn-1 — QnPn_1 = (—1)""!, whence it 
follows that 


P,, Py _ (-1)""! 


On On-1 ~ OnQn-1 , 
is j i Po Pa Poe... ce BS Ps Pi 
3. This implies that ay < a - o-oo = <o *~6o <a, = 1 and 
(because of 1) 
Pi Ft) a od 
On On-1 ~ n(n ~~ 1) 


The last of these results shows that the closed intervals bounded by P,/Q, and 
P,-1/Qn-1 are nested and of length tending to 0. Thus they have a unique common 
point, limy—.. Pn/Qn, which is the value of the infinite continued fraction 
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Exercises 


2.8.1 Show that 


1 ao(a, az + 1) + a2 
do + = 
1 aya, +1 : 
ay 
a2 


and use the Euclidean algorithm to show that the numerator and denominator on the right 
are relatively prime. 

2.8.2 Conclude from Exercise 2.8.1 that P2 = ao(a;az + 1) + dy and Q» = aja + 1, and deduce 
that 


Py =a)P,+Po and Qs =a2Q;+ Qo. 


2.8.3 Prove P,Qn-1 — QnPy-1 = (—1)""! by induction on n. 


Two interesting special cases are the continued fractions 


and ‘ 


which represent the numbers ae and ¥2 — 1, respectively. 
2.8.4 Use the recurrence relations to show that the convergents for a are ratios of successive 


Fibonacci numbers. 

2.8.5 Show that the convergents for V2 — 1 are ratios of successive terms of the sequence 
1, 2,5, 12,29, 70, 169, ..., in which each term is twice the previous term plus the term before 
that. 

2.8.6 From Exercise 2.8.5 deduce a result about successive convergents for V2. 


2.9 Historical Remarks 


A slogan to sum up this chapter might be: the basic theory of R equals Greek 
mathematics + Infinity (or even Euclid + Infinity). The ancient Greeks gave 
us integral and rational numbers and the principle of induction by which their 
properties may be proved. They also gave us infinite processes for approaching 
irrational numbers, though they did not dare to complete them or “take them to the 
limit.” Thus Euclid, in his Elements, Book X, Proposition 2, gave nontermination 
of the Euclidean algorithm as a criterion for irrationality. He gave no example at 
this point, but his Proposition 5 of Book XIII immediately implies periodicity, and 


hence nontermination, of the Euclidean algorithm on the pair eae , 1. This leads us 
to the continued fraction representation 
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but it would not have been accepted by the Greeks, since it implies “completing” a 
process that does not end. The Greeks accepted the “potential” infinity of a process, 
but not the “actual” infinity of its completion. 

Our use of the word “completion” to describe the creation of R from Q is 
appropriate, because it involves the simultaneous completion of infinitely many 
infinite processes. R turns out to be a perfect example of an actual infinity, because 
there is in fact no way to view it as a “potential infinity.” R must be comprehended 
in its totality or not at all. This remarkable discovery will be explained in Chap. 3, 
along with the contrasting discovery that Q can be viewed as a “potential infinity.” 

This was not known when Dedekind discovered the completion of Q by means 
of his cuts in 1858, but he was aware of the revolutionary nature of his discovery. In 
the first publication of his theory, he described the circumstances as follows. 


As a professor at the Polytechnic School in Ziirich I found myself for the first time obliged 
to lecture upon the elements of the differential calculus and felt more keenly than ever the 
lack of a really scientific foundation for arithmetic. In discussing the notion of the approach 
of a variable magnitude to a fixed limiting value, and especially in proving the theorem that 
every magnitude that grows continually, but not beyond all limits, must certainly approach a 
limiting value, I had recourse to geometric evidences. ... that this form of introduction into 
the differential calculus can make no claim to being scientific, no one will deny. For myself 
this feeling of dissatisfaction was so overpowering that I made the fixed resolve to keep 
meditating on the question till I should find a purely arithmetic and perfectly rigorous 
foundation ... I succeeded Nov. 24, 1858 


Dedekind (1872), pp. 1-2 


Dedekind’s desire to avoid “recourse to geometric evidences” in favor of a 
“purely arithmetic” foundation was part of a nineteenth century movement away 
from geometric foundations in mathematics. As we saw in Chap. |, the Greeks took 
the discovery of irrational quantities to mean that geometric magnitudes are more 
extensive than numbers, and for that reason they favored geometry as the foundation 
of mathematics. This attitude prevailed until the nineteenth century, partly because 
there was as yet no arithmetic model of the line. However, confidence in geometry 
was weakened by the discovery of non-Euclidean geometry in the 1820s, and the 
desire for arithmetic foundations was correspondingly strengthened. 

The creation of the number line by completion of Q was a big step towards an 
arithmetic foundation for analysis, but further digging was required. Q itself lacked a 
proper foundation as long as basic results, such as ab = ba, were justified by appeal 
to geometric intuition. Truly arithmetic proofs of the basic results had to wait for the 
method of induction to mature beyond the sporadic descent arguments that occur in 
Euclid. 


2.9 Historical Remarks 53 


Fig. 2.4 Hermann Grassmann and Richard Dedekind 


The first, rough, idea of proving properties of numbers by establishing them for 
1 and working upwards seems to occur in the work of Levi ben Gershon (1321). He 
used the idea to prove the basic formulas for permutations and combinations, such 
as the fact that there are n! permutations of n things. Induction proofs in almost the 
modern ascent format—a base step that establishes a property for n = 1 (or some 
other initial value), and an induction step showing that the property propagates from 
n to n + 1—occur in Pascal (1654), a book that introduced the so-called “Pascal’s 
triangle” to European readers. 

By the nineteenth century, this form of induction was in common use, but it took 
a brilliant mathematical outsider to see that induction was the absolute foundation 
of arithmetic. This was the message of the Grassmann (1861) inductive proof of the 
ring properties of the integers, though it went unnoticed by most mathematicians. 
The rediscovery of Grassmann’s results by Dedekind (1888) and Peano (1889) 
confirmed the importance and naturalness of his idea. The subsequent development 
of set theory, as we will see in Chap.6, not only reaffirmed the importance of 
induction, but also showed that it extends to all kinds of infinity. 


2.9.1 Ras a Complete Ordered Field 


The concept of a complete Archimedean ordered set is motivated by our geometric 
intuition of the line, which was also the ancient Greek intuition. The Archimedean 
property, as its name suggests, was mentioned by Archimedes. But before him it 
was stated by Euclid: 
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Two unequal magnitudes being set out, if from the greater there be subtracted a magnitude 
greater than its half, and from that which is left a magnitude greater than its half, and if this 
process be repeated continually, there will be left some magnitude which will be less than 
the lesser magnitude set out. 


Elements, Book X, Proposition 1. 


The concept of field is more modern and of algebraic origin. Although the 
Greeks essentially knew the field Q, their geometric concept of product did not 
allow unlimited multiplication of magnitudes, so there was no “field of magnitudes.” 
The concept of field developed in parallel with the development of algebra from 
the sixteenth century onwards, as mathematicians gradually became conscious of 
the rules for adding and multiplying numbers and symbolic expressions. 

The concept of an ordered field, and its specialization to the complete case, was 
considered by Dedekind (1872). However, it emerged more dramatically from a 
surprising development of the 1890s: the geometrization of algebra. Against the 
general tide of arithmetization, Hilbert (1899) showed that the nine properties 
defining a field, 


at+b=b+a ab = ba 
at+(b+c)=(a+b)+c a(bc) = (ab)c 
a+0O=a a-l=a 
a+(-a)=0 a-a'=1 fora#0 


a(b +c) =ab+ac, 


are equivalent to four geometric axioms. Updating the approach of Euclid, Hilbert 
introduced undefined objects called “points” and “lines,” subject to the following 
axioms: 


. Through any two points there is a unique line. 

. Any two lines meet in a unique point. 

. There are four points, no three of which lie on the same line. 

. If points A, B,C, D, E, F lie alternately on two lines, then the intersections of 
the lines AB and DE, BC and EF, CD and FA, lie on a line (shown dashed in 
Fig. 2.5). 


BRWN Re 


The first three axioms define what is called a projective plane, one line of which 
(chosen arbitrarily) is called the line at infinity. The intuition for these three axioms 
is that the line at infinity is the horizon and that lines meeting on the horizon are 
parallel. The fourth axiom is the theorem of Pappus, so-called because it becomes 
a theorem when the projective plane is supplied with coordinates. To be precise: 
in a plane with coordinates, lines have linear equations, and we can compute the 
intersections of the above lines and show that they lie on a line using the field 
properties. This is essentially the theorem proved by Pappus of Alexandria around 
300 CE. 
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Fig. 2.5 The theorem of Pappus 


Conversely, if a projective plane satisfies Axiom 4 we can define coordinates, 
with sum and product operations, and show that the nine field axioms are satisfied. 
Thus the part of the complete ordered field concept not originating in geometry turns 
out to have a geometric interpretation: the field concept is captured by the structure 
of a projective plane satisfying the Pappus theorem. Even more remarkably, the 
Pappus theorem can be held responsible specifically for ab = ba. This is because 
there is a weaker geometric theorem (implied by Pappus), called the Desargues 
theorem, which implies all of the field properties except ab = ba. 

So, to return to the question raised in Sect. 1.1, it is not unreasonable to seek a 
geometric explanation of ab = ba. But if you want the other field properties, Pappus 
is a better explanation than Euclid! 


Chapter 3 
Infinite Sets 


PREVIEW 


The construction of the continuous set R by “filling the gaps” in the set Q of rational 
numbers seems completely natural and simple, in hindsight. However, there is a 
huge difference between Q and R. While both are infinite sets, R is “more infinite” 
than Q. The purpose of the present chapter is to explain precisely what this means, 
and to compare some other important infinite sets with Q and R. 

The “smallest” infinite sets are those called countably infinite. The definitive 
example is the set of positive integers N = {1,2,3,4,...}. A set is called countable 
if its members can be arranged in a (possibly infinite) list: 1st member, 2nd member, 
3rd member, ..., so that each member occurs at some positive integer position. 
Perhaps surprisingly, the rational numbers can be arranged in such a list, so Q is 
countably infinite. In fact, any set with members that have “finite descriptions” in 
some reasonable sense turns out to be countable. 

This is not the case for the real numbers, many of which require infinite 
descriptions, such as infinite decimals. Indeed, there are some dramatic proofs that 
R is not countable. We give a couple of these uncountability proofs, and also find 
several sets that are ““equinumerous” with R, and hence also uncountable. 

The uncountability of R leads us to expect some difficulties and surprises when 
we come to investigate sets of real numbers. To prepare for what is in store, we 
prove some classical theorems about sets of real numbers and introduce a notable 
example—the so-called Cantor set. 


3.1 Countably Infinite Sets 


A set is said to be countable if its members can be enumerated—first member, 
second member, third member, and so on—and each member eventually appears 
in the enumeration. Notice that we do not assume that the enumeration ever comes 
to an end. If it does not, the set is called countably infinite. 
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The definitive example of a countably infinite set is the set of positive integers 
NS {15 25:3;435542) 


Another way to say that a set S is countably infinite is to say that the members of S 
can be put in one-to-one correspondence with the members of N: the first member 
of S corresponds to 1, the second member to 2, and so on. A little less formally, a 
set is countably infinite if its members can be arranged in an infinite list: 


first member, second member, third member, .... 


There are many examples of countably infinite sets, the most important of which 
are the following 


1. Any infinite subset of N, because (by induction) any such subset has a least 
member, second least member, third least member, and so on indefinitely. 
Examples are {even numbers}, {squares}, and {primes}. 

2. The set of integers, Z = {...,-3, -2,-1,0,1,2,3,...}. Zis countable because its 
members can be listed as follows: 


Oy Dy) 525253): $8 jo oe 


(That is, begin with 0 and then alternate members of N with their negatives.) 
3. The set of rational numbers between 0 and |. These numbers can be arranged in 
the following list: 


E12 we, a 23 
DB ar WAS 55 


4 15. 

5° 6 6- 

(That is, we first list the fractions with denominator 2, then those with denomi- 
nator 3, then those with denominator 4, and so on, including only those fractions 
that are in lowest terms.) 

4. The set Q* of positive rational numbers. To list the members of this set we group 
positive fractions according to the sum of their numerator and denominator: first 
those with sum equal to 2, then those with sum equal to 3, and so on. Within 
each group we list the fractions in increasing order of their numerators, again 
including only fractions in lowest terms. Then the list begins 


Adee? ak 
iia alae 


1 
> 6’ 


a A ose At 
1 


15 2 3 
SP Ppryp s7 574° 
5. The set Q of all rational numbers. To list the members of Q we first list 0, then 
alternate members of Q with their negatives (the same trick we used to enumerate 


Z, given the enumeration of N): 
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6. The set Q of all algebraic numbers, where an algebraic number is a root of a 
polynomial equation with integer coefficients. A polynomial of degree n, 


AnX" + dnjx | ++++ +a;x+ao = 0, (*) 


has at most n distinct roots, so the main problem in enumerating algebraic 
numbers is to enumerate all the polynomial equations (*), where ag, a1,..., dn 
are integers. 

To do this we consider the number 


h=n+ |ay| + |ay-1| + +++ + lai] + laol, 


which is called the height of the equation (*). There are only a finite number of 
equations of height < h, so we can make a list of all equations (*) with integer 
coefficients by first listing those of height 1, then those of height 2, and so on. 

Then if we list, along with each equation, its finitely many roots, we obtain a 
list of all algebraic numbers. 

7. The set of all finite subsets of N. A list of finite subsets may be constructed 
inductively as follows. At stage zero, list the empty set, 0. Then, assuming all 
subsets of {1,2,...,7} have been listed by stage n, at stage n + 1 list all sets 
obtained by inserting the number + | in previous sets. Then all subsets of 
{1,2,...,2,2+ 1} have been listed by the end of stage n + 1. The list therefore 
looks like this: 


O, {1}, {2}, {1,2}, {3}, {1,3}, {2,3}, (1,2, 3}, ... 


8. The set N<® of all finite sequences of positive integers. This seems like listing 
finite subsets, except that order is important and elements may be repeated. So 
it is more like the listing of polynomials above, and indeed we can use a similar 
concept of “height.” We assign the n-element sequence (a, a2,..., a,) the height 


N+ a, +-+++ay. 


Then there are only a finite number of sequences of given height, so we can list 
all sequences by listing those of height 1, then those of height 2, and so on. 

As special cases, the sets of all ordered pairs, ordered triples, and so on, of 
members of a countable set are themselves countable. 


3.1.1 The Universal Library 


The last example of a countable set above can be interpreted more dramatically. 
A word, a sentence, even a whole book is nothing but a finite sequence of symbols, 
which we could encode by natural numbers (indeed we need only finitely many 
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symbols if we include all the letters of the alphabet, punctuation symbols, and the 
blank space). Therefore, the list of finite sequences of natural numbers in principle 
includes every book that has been, or will ever be, written. This leads to the idea of 
a universal library, which has been a plaything for several writers. One of the most 
eloquent was the set theorist (and sometime dramatist) Felix Hausdorff, who wrote 
in his Grundziige der Mengenlehre of 1914, pp. 61-62 (my translation): 


If one adds to the letters further elements such as punctuation marks, spaces, numerals, 
notes, etc., then one sees that the set of all books, catalogs, symphonies, and operas is 
countable, and it remains countable when one allows countably many symbols (but only 
finitely many for each work). On the other hand, if one confines oneself to a finite number 
of symbols, and to works of a bounded length, say by allowing words no longer than one 
hundred letters and books of no more than one million words, then the set is finite. And if 
one supposes, with Giordano Bruno, an infinite numbers of worlds with speaking, writing, 
and music-making inhabitants, then it follows with mathematical certainty that in infinitely 
many of these worlds the same opera, with the same libretto, by a composer, librettist, 
conductor, and singers with the same names, will be performed. 


One might add that, if music is digitized, so that each performance becomes 
a finite sequence of bits, then the set of performances is countable and there is a 
universal music library. However, a more interesting question is: how big must the 
universal music library be if music is not digitized? A similar question is: how big 
must a library be to hold all possible handwritten manuscripts? We take up these 
questions in Sect. 3.4. 


Exercises 


An amusing model of countability, which apparently first appeared in the book One, Two, Three, 
...Infinity of Gamow (1947), is called Hilbert’s hotel. Hilbert’s hotel has a countable infinity of 
rooms—room 1, room 2, room 3, and so on—and sets are counted by packing them into Hilbert’s 
hotel, one member per room. Thus, the set N fills Hilbert’s hotel as shown in Fig. 3.1. 

Even though the hotel is full, it can make room for a new guest, say 0, by having each occupant 
move into the next room, as shown in Fig. 3.2. 


3.1.1 Explain how to make room for a countable infinity of new guests, such as —1, —2, —3, —4,.... 


Fig. 3.2, Making room for one more 
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by by bs by 


Fig. 3.3, Rooms for the first busload 


Now suppose that infinitely many buses arrive at the hotel, each carrying a countable infinity 
of passengers. Suppose that the passengers b, bz, b3,... of the first bus are given rooms as shown 
in Fig. 3.3. 

That is, skip one room after the first passenger, skip two rooms after the second passenger, and 
so on. 


3.1.2. Find rooms for the second busload, in such a way that there remain blocks of 1,2,3,... 
empty rooms. 

3.1.3, Deduce that it is possible to accommodate all passengers from all buses, so as to exactly fill 
Hilbert’s hotel. 


3.2 An Explicit Bijection Between N and N? 


In the flurry of results in the previous section, we skimmed over one that is important 
enough to study in some detail: a bijection between the set N of positive integers and 
the set N? of ordered pairs of positive integers. This bijection is crucial to several 
other bijections that we construct later, so it is important to be aware of it. And, 
indeed, the bijection between N and N? can be made very clear, both pictorially and 
by means of a simple quadratic function. We begin with a picture. 

Figure 3.4 shows the points (m,n) of N? in their usual grid arrangement, for small 
values of m and n. Also shown on the grid is a series of diagonal dotted lines that 
show how to enumerate all the points in N?. 

We take (1, 1) as point number | on the list, then continue numbering points as 
2,3 on the first diagonal, then 4, 5, 6 on the next diagonal, and so on. The numbers 
of the points are shown in gray. It is clear that each point eventually gets a number 
with this scheme, so we have established a bijection between N and N?. 

Moreover, we can obtain a formula for the number of point (m, n) as follows. The 
point (k — 1, 1) at the end of the (k — 2)nd diagonal obviously has number 


1424+3+4+---+(k-l=kk-1)/2. 
One more step brings us to the point (1, k) at the beginning of the (k — 1)st diagonal, 
and another m — 1 steps along this diagonal brings us to the point (m,k — m+ 1). 


Thus, 


number of point (m,k —m-+ 1) =m+k(k—1)/2. 
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Fig. 3.4 Enumerating the points in N? 


Rewriting this formula in terms of n = k-m+1,sok=m+n-— 1, we get 
number of point (m,n) =m+(m+n-— 1)\(m+n-—2)/2. 


We will denote the number of point (m,n) by p(m, n) (so you can think of p standing 
for “pairing’’). 


Exercises 


3.2.1 Find a polynomial bijection from N? to N. What is its degree? 

3.2.2 Explain why there are countably many points in R? with rational coordinates. 

3.2.3 Show that there are countably many circles in R? with rational center and rational radius. 
3.2.4 Also, show that there are countably many circles with three rational points. 


3.3. Sets Equinumerous with IR 


The countable sets studied in Sect. 3.1—Z, Q, and Quelie more and more densely 
on the number line R, yet no method is apparent for listing all the members of R. 
The problem appears to be that the description of a real number is generally infinite, 
and the methods above are good only for listing objects with finite descriptions. 
Before attempting to prove that R is not countable, however, we will look at some 
interesting sets that are equinumerous with R. 
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-1 1 


O 


Fig. 3.5 Bijection between an interval and the line 


Definition. Sets are equinumerous, or of the same cardinality, if there is a one-to- 
one correspondence (bijection) between their elements. 


Thus, the countably infinite sets in the previous section are equinumerous with 
N, or of the same cardinality as N. The sets equinumerous with R include some 
subsets of R, some sets that contain R, and also some sets consisting of infinite 
objects derived from the countable set N. We sometimes say that such sets have 
continuum-many elements. 


1. The first example of a subset equinumerous with R is the open interval 
(-1,1l)={xeR:-l<x< ]}. 


An easy way to see a bijection between (—1, 1) and R is to imagine (—1, 1) bent 
into a semicircle that rests on the number line at O, as shown in Fig. 3.5. Rays 
from the center of the semicircle establish a one-to-one correspondence between 
points of (—1, 1) and points of the line. 

A similar picture shows that any open interval 


(a,b) ={xER:a<x< Db}, 


where a < b, is equinumerous with the whole number line. An interesting case, 
not needing the picture, is the interval (—2/2, 72/2). The tan function maps this 
interval one-to-one onto R, as one can see from the graph of the tan function. 

2. The closed interval [0,1] = {x € R: 0 < x < 1} is equinumerous with the open 
interval (0, 1), and hence with R. We show this by constructing a bijection of 
[0, 1] onto (0, 1). Consider the numbers 


1 ee 
= SS ge TS oer She 


which belong to both [0, 1] and (0, 1). To map [0, 1] one-to-one onto (0, 1) we 
send 


0, 1,7, 72, 73,... in [0,1] 
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respectively to 
Fiy Ti Pay Tas Toys. Im (0,1), 


and send every other member of [0, 1] to itself. 
Similarly, R is equinumerous with any closed interval [a, b] with a < b. 

3. Another subset equinumerous with R is the set! R — Q of irrational numbers. 
As we saw in the previous section, the rational numbers can be enumerated 
r|,12,13,... . From this list we can construct a countable infinity of irrational 
numbers, for example 


Ss, = v2n, s2.= V2, s3= V2r., 
We can now define a one-to-one function from R onto R — Q by sending 
ri, 51,172, $2, 73, 53,... in R 


respectively to 


S1, 82, 53, S4, 85, S6,-.. in R-Q 


and sending each other member of R to itself. 
4. The set P(N) of subsets of N. (The letter P stands for “power set” and means “all 
subsets of.” We say more about this operation in Chap. 6.) 

A subset S of N can be described by an infinite sequence of Os and 1s, with 1 
in the nth place if and only if n € S. Such a sequence can be interpreted as the 
binary expansion of a number in [0, 1]. The only problem is that different subsets 
can give the same number, for example 


{1} is described by 10000..., which gives the number 0.10000... = 5. 
{2,3,4,...} is described by 01111..., which gives the number 0.01111... = 5 


The numbers that correspond to different sets are the binary fractions m/2", and 
the corresponding subsets of N are the finite sets and their complements. Leaving 
these exceptional numbers and sets aside for the moment, we have a bijection 
from 


[0, 1] — {binary fractions} onto 


P(N) — {finite sets and their complements}. 


'In this book we use an ordinary minus sign to denote set difference. This is convenient later to 
show the parallel between set difference and number difference in measure theory. In any case, it 
will always be clear what kind of objects we are taking the difference of. 
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Finally, by a bijection between the countable sets 
{binary fractions} and _ {finite sets and their complements} 


gives a bijection of [0, 1] onto PCN). So P(N) is equinumerous with [0, 1], and 
hence with R. 
5. The set Nx Nx Nx--- = N® of infinite sequences of positive integers. 
Each infinite sequence” (n,,n2,3,...) of positive integers gives an irrational 
number 


n3 + — 


between O and 1, because each rational has a finite continued fraction, as we 
saw in Sect. 2.7. Conversely, each irrational number between 0 and | has a 
continued fraction of the above form, and hence gives an infinite sequence of 
positive integers. 

Thus, we immediately have a bijection between N™ and the irrational numbers 
in (0,1). The latter set is equinumerous with (0,1), and hence with R, by an 
argument like that used to show that R — Q is equinumerous with R. 


Exercises 


3.3.1 Show that (0, 1) U {1}, (0, 1) U {1, 2}, and (0, 1) U {1, 2,3} are equinumerous with R. (Hint: 
Build a copy of “Hilbert’s Hotel” inside (0, 1).) 

3.3.2. Show that (0, 1) U {1, 2,3,...} is equinumerous with R. 

3.3.3, Show that (0, 1) U (2, 3) is equinumerous with R. 

3.3.4 Encode each (1,72, n3,...) € NN by an infinite sequence of Os and Is with infinitely many 
Os, and hence give another proof that N™ is equinumerous with R. 


3.4 The Cantor—Schroéder—Bernstein Theorem 


The bijections in the previous section call for a certain amount of ingenuity, in 
order to construct bijections from maps that are not quite bijections. To avoid 


2We use the notation (a, b,c, ...) for the infinite sequence in conformity with the notations (a, b) 
and (a, b, c) for ordered pairs and triples. 
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ever-increasing demands on our ingenuity in the future, we now prove a theorem 
that guarantees a bijection when we “almost” have one, namely, when we have a 
bijection from A to a subset of B, and one from B to a subset of A (or, equivalently, 
an injection from A into B, and one from B into A). 


Cantor-Schréder-Bernstein Theorem. /f there are injections f : A — Band 
g:B-A then there is a bijectionh: A> B. 


Proof. Consider chains of elements, alternately in A and B, that are connected by 
alternate applications of f and g. A portion of a chain looks like this, 


g f g f g f g 
Rab PaPbRearhy::: 


where ...,d9,d1,42,...€ Aand...,bo, bi, bo,...€ Band f(ag) = bg, g(bg) = aga. 
Since f and g are functions on A and B, respectively, each chain extends indefinitely 
(and uniquely) to the right, possibly repeating the same finite sequence over and 
over. Since f and g are injective, each chain extends uniquely to the left too, though 
it may terminate—ceither at an a that is not in the range of g or at a D that is not in 
the range of /. 

The injectiveness of f and g also implies that any two chains with a common 
element a, (or by) are identical. Hence distinct chains contain disjoint subsets of A 
and disjoint subsets of B. This enables us to set up a bijection h : A — B by piecing 
together the obvious bijections within each chain. 


1. For any qa in a chain without an initial element, let h(a,) = by, in which case 
h7!(by) = ak. 

2. For any a, in a chain with initial element in A, again let h(a,) = by, in which case 
h7!(by) = ak. 

3. For any chain with initial element in B, let h(a,) = bx_-1, in which case 
A" (by-1) = ax. 


It follows, from the partitioning of elements of A and B among the chains, that 
each a € A is now paired with a unique h(a) € B, and each b € B is paired with a 
unique h7!(b) € A. g 


3.4.1 More Sets Equinumerous with IR 


Armed with the Cantor—Schréder—Bernstein theorem, we can now exhibit some 
spectacular examples of sets equinumerous with R. The main problem that 
Cantor-Schréder-Bernstein has to overcome is the ambiguity of decimal expansions. 
That is, the same number is sometimes represented by two different decimal 
expansions; for example, 1/2 is represented by both 0.5 and 0.49999... It will 
be seen below that we can get around this problem by constructing injections and 
applying Cantor—Schréder—Bernstein, rather than attempting to construct bijections 
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directly. However, the ambiguity of decimal expansions continues to cause trouble 
in the future, and we will eventually work with a different set of continuum 
cardinality—the set N™ introduced in Sect. 3.3—1o avoid such difficulties. 

The first example is the plane R?. When this example was discovered by Cantor 
in 1877 he wrote to Dedekind: “I see it but I don’t believe it,” apparently astonished 
that a two-dimensional continuum of points could be equinumerous with the one- 
dimensional continuum. (See Gouvéa 2011 for an engaging account of this episode.) 
We continue the numbering of examples from the previous section. 


6. The set R? of ordered pairs (x, y) of real numbers. 

We have an obvious injection of R into R7—namely, send each x € R to the 
ordered pair (x, 0)—so it remains to find an injection of R* into R. We inject R? 
in two stages. 

First, map R? bijectively onto the “open unit square” (0, 1)? by mapping each 
R bijectively onto the open interval (0, 1) as in example | of the previous section. 

Now take any (x, y) € (0, 1)” and consider the decimal expansions of x and y: 


x =0.a;a2a3..., y=O0.b,b2b3... 


We can choose these decimal expansions uniquely by not allowing expansions 
ending with 999.... Hence we can send each pair to the well-defined decimal 
number 


i= 0.a,b)a2b2a3b3 pies 


Moreover, since z cannot end with 999... and hence equal another decimal 
expansion (because neither x nor y do), different pairs (x,y) give different 
numbers z. Thus, the map (x, y) > z is an injection of (0, 1)? into R. Combining 
this injection with the bijection R? — (0, 1)* gives an injection R? > R, as 
required. 

7. The setRxRxXRx:-::-=R® of sequences (x1, X2, X3,...) of real numbers. 

Again we have an obvious injection of R into RN, by sending x to (x,0,0,...), 
and it remains to find an injection of R™ into R. 

The first step is to map RN bijectively onto the “infinite-dimensional open unit 
cube,” (0, 1)%, consisting of the sequences (x1, x2, x3,...) where each x; € (0, 1). 
As in the previous example, this is done by mapping each R bijectively onto (0, 1) 
as in example | from the previous section. 

Now take any (x1, x2,.x3,...) € (0, 1)" and consider the decimal expansions 
Of x1, .X2,.%3,...2 


Xx, = 0.a114124)3°°- 
X2 = 0.217203 *- 


X3 = 0.431032433 °°: 
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We choose these expansions uniquely by not allowing any expansion to end with 
999... . We can pack all the decimal digits a,,, in this array into the decimal 
expansion of a single number x by rearranging the array of digits into a list, just 
as we enumerated the ordered pairs (m,n) of natural numbers in Sect. 3.2. The 
resulting decimal expansion begins 


X = 0.01121 412431 422413041 432023014... 


and in general a), is in place number p(m,n), where p is the quadratic function 
of m and n defined in Sect. 3.2. 

Each decimal place of x gets filled, so the sequence (x), x2, .x3,...) is sent 
to a well-defined x € R. Moreover, x cannot end with 999 ..., because none of 
X1,X2,xX3,... do, so different sequences give different numbers x. 

Thus, the map (x1, x2,x3,...) 18 x is an injection from (0, 1)" into R. 
Combining it with the bijection from R™ to (0, 1) gives an injection RY > R, 
as required. 

8. The set of all continuous functions f : R > R. 

We will define and study continuous functions carefully in Chap. 4. But for 
now it is sufficient to know that a continuous function f : R — R is completely 
determined by its values on the set Q of rational numbers. It follows, since we 
can list Q as a sequence 7, 72,73,..., that f is completely determined by the 
sequence 


(FD), £12), Fra). ERS, 


We can therefore inject the set of real continuous functions f into R by combining 
the map fb (f("1), f(72), f(rs), -..) into RW with the injection RY = R found 
in the previous example. 

Conversely, we certainly have an injection of R into the set of continuous 
functions. Just send the real number c € R to the constant function f(x) = c. 
Thus, it follows from the Cantor—Schréder—Bernstein theorem that the set of 
continuous functions f : R > R is equinumerous with R. 


3.4.2. The Universal Jukebox 


Let me begin with a quote from The Six Gateways of Knowledge, by Lord Kelvin, 
an address to the Birmingham and Midland Institute, delivered in the Town Hall, 
Birmingham, on October 3rd, 1883. It was later published in his Popular Lectures 
and Addresses, Kelvin (1889), volume 1, pp. 274-275. 


But now for what really to me seems a marvel of marvels: think what a complicated 
thing is the result of an orchestra playing .... Think of the condition of the air, how it 
is lacerated sometimes in a complicated effect. Think of the smooth gradual increase and 
diminution of pressure—smooth and gradual though taking place several hundred times in 
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Goldberg Variations 


i 


Heartbreak Hotel 


Fig. 3.6 The universal jukebox 


a second—when a piece of beautiful harmony is heard! Whether, however, it be the single 
note of the most delicate sound of a flute, or the purest piece of harmony of two voices 
singing perfectly in tune; or whether it be the crash of an orchestra, and the high notes, 
sometimes even screechings and tearings of the air, which you may hear fluttering above 
the sound of the chorus—think of all that, and yet .... A single curve, drawn in the manner 
of the curve of prices of cotton, describes all that the ear can possibly hear, as the result of 
the most complicated musical performance. 


The phenomenon described by Kelvin—the superposition of sound waves into a 
single wave—had already been exploited by Edison when he first recorded sound in 
1877. By using the single sound wave to drive a vibrating needle, Edison transferred 
the wave onto a wax cylinder, from which the sound could be replayed by reversing 
the process. Even today, when digitized music is everywhere, many audiophiles 
prefer the analog sound captured by the continuous wave on the grooves of a vinyl 
disk. And vinyl disks are still played in old-style jukeboxes. 

If we accept that a perfectly faithful sound recording needs to be a continuous 
function, then the universal jukebox needs to be a repository of all continuous 
functions. The last example of the previous section shows, amazingly, that each 
continuous function may be encoded by a single real number. Thus, we can take the 
universal jukebox to be the number line, with each musical performance represented 
by a single point (Fig. 3.6). This is really the marvel of marvels! 


Exercises 


Many of our previous results on countable sets and sets equinumerous with R can be proved more 
simply with the help of the Cantor-Schréder—Bernstein theorem. 


3.4.1 Show that (m,n) 1 23" gives an injection N? — N, and hence show that N? is 
equinumerous with N. 
3.4.2 If p), po, p3,... are the prime numbers, show that the map 


2 n 
(7y,Mz,.--,Ng) > 23! ++ pi 


is an injection of {finite sequences of positive integers} into N. Hence show that 
{finite sequences of positive integers} is equinumerous with N. 

3.4.3 Prove that [0,1] is equinumerous with (0,1) by finding suitable injections from one to the 
other. 


Moreover, many results we previously had not contemplated become easy with the Cantor— 
Schréder—Bernstein theorem. For example: 
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3.4.4 For an arbitrary S C R, show that (0, 1) US is equinumerous with R. 


An interesting application of the fact that continuous functions are determined by their values 
on Q is the following theorem of Cauchy (1821), pp. 104-106. If f : R > R is continuous and 
additive—that is, f(x + y) = f(x) + fy) for all x, y € R—then f(x) = ax for some constant a. 


3.4.5 If f(x+y) = f(x) + f(y) and f(1) = a, deduce that f(r) = ra for each rational number r. 
3.4.6 Deduce from Exercise 3.4.5 that, if f is continuous, then f(x) = ax. 


3.5 The Uncountability of IR 


The set R of real numbers is not a countable set, because we can show that any 
countable set of real numbers is not all of R. This result shows the need for set 
theory as a theory of infinity, since different kinds of infinity exist. Here are two 
different (though distantly related) arguments for this famous result. 


3.5.1 The Diagonal Argument 


The first argument, due to Cantor (1891), takes any countable set of real numbers 
and explicitly finds a number different from each member of S. 


Countable sets do not include all real numbers. /f S is a countable set of real 
numbers, then there is a member of [0,1] not in S. 


Proof. Suppose that S = {x,, x2, x3,...} is a countable set of real numbers. Each 
number x, can be written as an infinite decimal, and we imagine all of these decimal 
expansions listed in an infinite table, such as the following: 


x O11111... 
x2 3.14159... 
x3 1.23456... 
xq 0.21212... 


xs 1.41423... 


Ignoring the parts of each number before the decimal point, we construct a 
number x that differs from each x, by the simple expedient of making x different 
from x, in the nth decimal place. To be specific, let 


2 if nth decimal place of x, is 1 


nth decimal place of x = . ; ; 
1 if nth decimal place of x, is not | 


This makes sure that x not only has a different decimal expansion from x,; x is 
also a different number, because we have avoided the ambiguous numbers such as 
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0.999... = 1.000... by using only the digits 1 and 2. For the numbers x, x2, .x3,... 
tabulated above, we have 


x=0.21121... 


Thus, the countable set S does not include all members of R. Specifically, it does 
not include the number x in [0, 1]. oO 


The argument above is known as the diagonal argument because it uses the 
digits on the diagonal of the table of decimal expansions (underlined in the example 
above). There are many variations of the diagonal argument; indeed, it is hard to get 
away from it when proving that R is uncountable. 


3.5.2. The Measure Argument 


Again we show that a countable set of real numbers cannot fill the interval [0, 1], 
but this time the argument shows that a countable set falls far short of filling [0, 1]. 


Countable sets have small measure. /f S is a countable set of real numbers, then 
a large fraction of the members of [0,1] are not in S. 


Proof. Suppose that S = {x1, x2, x3,...}. Suppose that we enclose each number x, 
by an open interval U,, of width 1/10”. Then the amount of any interval, say [0, 1], 
covered by U;, U2, U3,... is at most 


1 1 1 1 


ee ae ee eee 
10’ 102 103 9 


Thus, not all of [0, 1] is covered, so there are real numbers in [0, 1] that are notin S. 

In fact, at least 9/10 of the interval is not in S. Moreover, we could rerun the 
argument with 1/100 (or 1/1000, 1/10000, and so on) in place of 1/10 to conclude 
that the fraction of [0, 1] not in S' is arbitrarily close to 1. oO 


When the argument above is applied to the countable set of rational numbers 
between 0 and 1, 


> 


sal! 213°-123415 123456 

={5 SO aA BAG 5 5 GO eg ge ee a 

we conclude that irrational numbers exist in [0, 1]. Of course, we already knew this, 
but it is surprising that any number can escape being covered by any of the intervals 
U,,. After all, the rational numbers lie densely on the line, so intuition may suggest 
that covering each rational with an interval will cover all of [0, 1]. 

Summing the infinite series a + = + = +--+ seems to refute this naive intuition, 
and there are two ways to obtain a clearer view. 
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1. If the intervals U;, U2, U3,... cover all of [0, 1] (somehow, despite their small 
total length), then we can show that finitely many of these intervals, say 
U,, U2,..., Um, also cover [0, 1]. A theorem to this effect is proved in the next 
section. Since the total length of Uj, U2,..., Um is less than 1/9, this is clearly 
absurd. 

2. Since U, has length 1/10, one of the decimal fractions 


0.0, 0.1, 0.2, ..., 0.9, 


does not lie in U;. Choose one of these fractions, say 0.2. Then, since U2 has 
length 1/100, one of the decimal fractions 


0.20, 0.21, 0.22, ..., 0.29 


does not lie in U2. Choose one of these fractions, and continue. At stage n we 
add an nth decimal place that keeps our number out of U,,. The infinite decimal 
thus obtained (with due precautions to avoid ambiguous decimals) therefore lies 
outside all of the intervals U;, U2, U3,.... 


The second way of clarifying the measure argument is a strong hint at the diagonal 
argument. Indeed, the diagonal argument is precisely what comes to mind when one 
tries to find a specific number not covered by the intervals U;, U2, U3,.... 


Exercises 


Cantor first applied the uncountability of R to prove the existence of transcendental numbers; that 
is, nonalgebraic numbers. 


3.5.1 Explain how the existence of transcendental numbers follows from the uncountability of R. 
3.5.2 Show, in fact, that ‘‘almost all” real numbers are transcendental. 


The key idea of the diagonal argument—making a new object x that differs from the nth given 
object x, in the nth place—works even better with subsets of N and sequences in N than it does 
with real numbers (because the problem of “ambiguous objects” does not arise). 


3.5.3 Given S;,5,53,... CN, explain how to define an S C N such that S # each S,,. 
3.5.4 Given fi, fi, f,... € NN, explain how to define an f € N™ such that f # each f,. 


The uncountability of R, and hence of P(N), leads to some surprising results about subsets of N. 
Here are two such results, based on associating real numbers with sets or sequences of rationals, 
and exploiting the countability of Q. 


3.5.5 Show that there are uncountably many sets S, CN such that, for any S$ ,,S,, eitherS, CS, 
or Sy CS,. 

3.5.6 Show that there are uncountably many sets 7, ¢ N, any two of which have only a finite 
intersection. 


Cantor’s first uncountability proof (in 1874) relied on the nested interval property from 
Sect. 2.6. Given countably many real numbers x,, x2,x3,..., he found an x # xX), X2,%3,... in 
nested intervals defined as follows. Let a, = x, 
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b, = first x; beyond a, such that x; > a1, 
dy = first x; beyond b; such that a; < x; < by, 


bo = first x; beyond az such that a; < az < x% < bi, 


and so on. Thus, [a,,);] D [a2,b2]D---. 


3.5.7 If the sequence of intervals is finite, conclude that there is an x # x), x2,.... 
3.5.8 If the sequence of nested intervals is infinite, conclude that any of its common points x # 
X1,%X2,-+5 


3.6 Two Classical Theorems About Infinite Sets 


The concept of a limit point, introduced in Sect. 2.6 in the case of infinite sequences, 
has an important generalization to infinite sets. 


Definition. A point x is a limit point of a set S if, for any ¢ > 0, there is a point of 
S other than x within distance « of x. 


Limit points play an important role in mathematics, particularly in the two 
theorems below, which are crucial to later developments. These theorems illustrate 
how the concept of limit point is intimately related to the concept of infinite set, 
which we now know includes uncountable sets. For the sake of definiteness, we 
state the theorems as properties of the interval [0,1], but they apply to any closed 
interval. 


Bolzano-—Weierstrass Theorem. Any infinite set S of points in [0,1] has a limit 
point in (0, 1]. 


Proof. Since I = [0, 1] contains infinitely many points of S, so does (at least) one 
half of J, either [0, 1/2] or [1/2, 1]. To be specific, let 
I, = leftmost half of J that contains infinitely many points of S. 


Similarly, let 


I, = leftmost half of J; that contains infinitely many points of S, 


1; = leftmost half of J, that contains infinitely many points of S, 


and so on. Then J, [,, I5,...is anested sequence of closed intervals, with lengths that 
become arbitrarily small, and hence with a common point x by the nested interval 
property of Sect. 2.6. 

The point x is a limit point of S because in each [, there are points of S other 
than x, and hence such points are arbitrarily close to x. oO 


74 3 Infinite Sets 


The next theorem concerns open intervals (a,b), exploiting the property that 
(a, b) contains, along with any member x, any sufficiently small interval contain- 
ing x. 

A set of open intervals may be uncountable, so when we denote a member of the 
set by U; we allow the index i to range over a possibly uncountable set. We say that 
intervals U; cover [0, 1] if [0, 1] is contained in the union of the Uj. 


Heine-Borel Theorem. /f [0,1] is covered by infinitely many open intervals Uj, 
then [0, 1] is covered by finitely many of the Uj. 


Proof. Suppose on the contrary that J = [0,1] can be covered only by infinitely 
many of the intervals U;. Then it follows that some half of J, either [0, 1/2] or 
[1/2, 1], also can be covered only by infinitely many of the U;. As in the previous 
proof, we make specific choice: 


I, = leftmost half of J that can be covered only by infinitely many Uj. 
And similarly: 


I, = leftmost half of J; that can be covered only by infinitely many Uj, 


I; = leftmost half of J, that can be covered only by infinitely many Uj, 


and so on. In this way we obtain a nested sequence of closed intervals J, I), h,..., 
none of which can be covered by finitely many of the U;. Since J, 1), h,... become 
arbitrarily small, there is one point x common to them all, as in the previous proof. 

But x belongs to some U; (since the U; cover all points of [0, 1]); call it U;. Since 
Uj; is open, it covers any sufficiently small J, along with x. This contradicts the 
assumption that each J, can not be covered by finitely many of the Uj. 

Therefore, our original assumption, that [0, 1] cannot be covered by finitely many 
of the U;, is false. oO 


The properties of [0, 1] proved in the two theorems above reflect what is called 
its compactness. This property does not hold for the open interval (0, 1), or for the 
whole line R. 


Definition. A set K is called compact if any cover of K by open intervals has a 
finite subcover. 


The Heine—Borel theorem also enables us to clear up the problem raised in the 
second proof of uncountability in the previous section: whether the interval [0, 1] 
can be covered by open intervals of total length < 1. If we have such a set of open 
intervals U; covering [0, 1], then finitely many of the U; cover [0, 1] , and their total 
length is also < 1. Then, if we merge any overlapping members of this finite set into 
single open intervals, we obtain a finite set {V), V2,..., Vin} of disjoint open intervals 
covering [0, 1], with total length < 1. This is clearly impossible. 

It was precisely to clear up this point that Borel (1895) introduced what we now 
call the Heine—Borel theorem. On page 51 of that paper he commented as follows 
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Fig. 3.7 Example of a tree 


on the lemma that a closed interval J cannot be covered by open intervals with a 
total length less than that of /: 


One may regard this lemma as obvious; nonetheless, because of its importance, I wish to 
give a proof based on a theorem of interest in itself .... Jf on a line [interval] there are an 
infinity of [open] intervals, so that each point of the line lies in at least one of the intervals, 
then one can effectively determine a FINITE NUMBER of intervals among the given intervals 
with the same property (that each point of the line is in the interior of at least one of them). 


Exercises 


3.6.1 Give examples showing that the Bolzano—Weierstrass and Heine—Borel theorems do not 
hold with (0,1) or R in place of [0,1]. 

3.6.2 Use the Bolzano—Weierstrass theorem to show that nested interval property implies the least 
upper bound property, as claimed at the beginning of Sect. 2.6. 

3.6.3 Prove that an infinite sequence of real numbers x), x2, x3,... (say, in [0,1]) contains either 
an infinite subsequence y; < y2 < y3 <--- or an infinite subsequence z; > Zz > 73 >°-°:. 
(Hint: Look at a limit point of x1, x2, x3,....) 


The proofs of the Bolzano—Weierstrass and Heine—Borel theorems (and also Exercise 3.6.3) are 
based on the so-called “infinite pigeonhole principle.” This principle says that, if an infinite set is 
divided into finitely many parts, then one of the parts is infinite. 

Another theorem that begs to be proved by the infinite pigeonhole principle is the Kénig infinity 
lemma, which states that an infinite tree whose vertices have finite degree has an infinite branch. 

A tree is a structure like that shown in Fig. 3.7, in which there is a top vertex, connected to 
other vertices by edges, which are connected to other vertices in turn, in such a way that any two 
vertices are connected by a unique sequence of edges. The degree of any vertex is the number of 
edges connecting it to other vertices. 


3.6.4 Prove that an infinite tree whose vertices have finite degree has an infinite branch, that is, an 
infinite sequence of vertices each connected by an edge to the one before. 

3.6.5 Reinterpret the proof of the Bolzano—Weierstrass theorem as the construction of an infinite 
tree, whose infinite branches correspond to limit points. 


3.7. The Cantor Set 


A surprising and important uncountable set is one known as the Cantor set. This 
set, which we will call C for short, is also known as the “middle third” set because 
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Fig. 3.8 Early stages in the construction of the Cantor set 


Fig. 3.9 Constructing the Cantor set via a tree 


of the process that constructs it—removal of an infinite sequence of open intervals 
from the unit interval [0,1]. 

The first stage removes the middle third, (1/3,2/3), leaving the two closed 
intervals [0,1/3] and [2/3,1]. The second stage removes their middle thirds, leaving 
the four closed intervals [0,1/9], [2/9,1/3], [2/3,7/9], and [8/9,1]. The third stage 
removes their middle thirds, and so on. The results of the first six stages are shown 
in Fig. 3.8. 

Each stage produces a finite union of closed intervals, each of which is 1/3 
of an interval produced at the previous stage. Each nested sequence of intervals 
from successive stages produces exactly one point in C, because the lengths of the 
intervals tend to 0. Conversely, each point of C arises in this way, so the points of C 
correspond to the infinite paths down the tree shown in Fig. 3.9. 

Each infinite path down the tree passes through a sequence of vertices, at each 
of which there is a choice to go left or right, corresponding to the choice of the 
left or right third of an interval. Thus, the points of C can be described by infinite 
sequences of the letters L and R. For example, the sequence LLLL... gives the point 
0, and RRRR... gives the point 1. 

An obvious diagonal argument shows that there are uncountably many infinite 
sequences of Ls and Rs, so C is an uncountable set. Indeed, it is clear that C is 
equinumerous with the set of infinite sequences of Os and 1s, which was shown in 
Sect. 3.3 to be equinumerous with R. Thus, C is equinumerous with R, which is 
surprising, since C has measure zero! 
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3.7.1 Measure of the Cantor Set 
We can find the measure of C by adding up the lengths of the intervals removed 
from [0,1] in the construction of C. 

In stage 1, the length removed = 1/3; 

in stage 2, the length removed = 2 x 1/9 = 2/9 = 2/37; 

in stage 3, the length removed = 4 x 1/27 = 4/27 = 27/3°; 

in stage 4, the length removed = 8 x 1/81 = 8/81 = 23/34; 


and in general 


in stage n + 1, the length removed = 2”/3"*!, 


So, 


1 >. {2 (2y 
total length fo 2442 2 a 
ota. eng remove | + 3 (5) +(3] + 


This is an instance of the general geometric series atar+ar* +--+ (witha = 1/3 and 

r = 2/3), which has sum ae Therefore, the total length of the intervals removed in 
the construction of C is 

| oe 

1-2/3 1/3 — 


and hence the measure of C itself is zero. 


1, 


Exercises 


The removal process that creates C has a nice interpretation in terms of base 3 (“ternary”) 
expansions of real numbers in [0,1]. 


3.7.1 Explain why removing the middle third of [0,1] leaves the numbers whose first ternary digit 
is 0 or 2. 

3.7.2, Explain why removing the middle thirds of [0,1/3] and [2/3,1] leaves the numbers whose 
first and second digits are 0 or 2. 

3.7.3, By continuing this argument, show that the numbers in C are those with ternary expansions 
that can be written entirely with Os and 2s. 

3.7.4 Use this ternary representation to give another proof that C is uncountable. 


The Sierpiriski carpet is a two-dimensional variant of the Cantor set, obtained by successively 
removing “middle thirds” from squares. Figure 3.10 shows the first three approximations to the 
Sierpinski carpet. 


3.7.5 Show that the area of the Sierpinski carpet is zero. 
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Fig. 3.10 First three approximations to the Sierpinski carpet 


3.8 Higher Cardinalities 


So far we have seen infinite sets of two different cardinalities: those with the 
cardinality of N and those with the cardinality of R. Moreover, R is of higher 
cardinality than N in the sense that there is an injection from N into R, but no 
bijection, because of the diagonal argument. Cantor (1891) noticed that the diagonal 
argument may be applied to any set X to produce a set of higher cardinality than X; 
namely, the power set P(X) whose members are the subsets of X. So in fact there 
are infinite sets of infinitely many different cardinalities. 


Cantor’s Theorem on the Power Set. For any set X, there are more subsets of X 
than there are elements. 


Proof. Consider any pairing x; < X; between the elements x; of X and certain 
subsets X; of X. No matter how the pairing is made, the sets X; do not include all 
the subsets of X because they do not include the diagonal set D defined by the 


property 
xi€ DS x; € Xj. 


Indeed D differs from each X; with respect to the element x;; if x; is X; then x; is not 
in D, and if x; is not in X; then x; is in D. 

Thus, there are more subsets of X than there are elements of X. In other words, 
the set P(X) has higher cardinality than X. oO 


It follows in particular that subsets of R are more numerous than the real 
numbers. It happens that the subsets of R that come most naturally to mind (the 
Borel sets, see Chap. 8) form a collection with only as many members as R. Thus, 
we have the opportunity to use the diagonal argument to find new sets of real 
numbers beyond the obvious ones—much as we used the diagonal argument, in 
Exercise 3.5.1, to find new real numbers beyond the algebraic numbers. 
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3.8.1 The Continuum Hypothesis 


With the discovery that there are different kinds of infinity, two questions arise: 


1. Does R represent the smallest uncountable infinity? In particular, is there an 
uncountable set of real numbers not equinumerous with R? 

2. Is the diagonal method essentially the only way to prove the existence of 
uncountable sets? 


The conjecture that any uncountable set of real numbers is equinumerous with 
R was first posed by Cantor (1878), and it is the first version of what is called 
the continuum hypothesis. We will discuss this hypothesis, which is not yet settled, 
further in Chap. 5. There we will also show that the diagonal method is not the 
only way to prove the existence of uncountable sets. There is another method, 
also discovered by Cantor, involving the so-called ordinal numbers. The concept of 
ordinal number also leads to a sharper statement of the continuum hypothesis, and 
to the clarification of axioms for set theory, which will be the subject of Chap. 6. 


3.8.2 Extremely High Cardinalities 


It will become clear when we discuss axioms for set theory in Chap. 6 that iteration 
of the power set operation f can produce sets of extraordinarily high cardinality. 
Just to give a taste of what is possible, consider the sequence 


N, PWN), PPO), 


Each set in the sequence has cardinality greater than the one before, so the union of 
all these sets, 


Y=NUPWN)UPPI(N))U:::, 


has cardinality greater than any of N, PON), P(P(N)), .... Then, of course, P(Y) has 
cardinality greater than Y, and so on. One wants to say “ad infinitum,” but it is no 
longer clear what that means—infinity is certainly bigger than we first thought. 

Despite the immense power of set theory to produce sets of high cardinality, 
there are “largeness” properties so exorbitant that sets with such properties cannot 
be proved to exist. For example, a set Z is called inaccessible if 


1. Z has infinite members, 

2. X € Z implies P(X) € Z, and 

3. X € Z implies that the range of any function with domain X and values in Z is a 
member of Z. 
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The existence of inaccessible sets is not provable from the standard axioms of 
set theory, for reasons that will emerge in Chap. 6. Here is a clue (though it will 
probably not help at this point): if an inaccessible set exists, then its existence is not 
provable! 


Exercises 


3.8.1 Show that P(PCIN)) is equinumerous with the set of real functions. 


The power set operation is also interesting when applied to finite sets, starting with the empty 
set (denoted by { } or 0). 


3.8.2 If F = {x,,...,x,} is a set with n elements, show that P(F) has 2” elements. 
3.8.3, According to Exercise 3.8.2, P(O) has one element and P(P(O)) has two. Write down these 
elements. 


Iterating the power set operation F any finite number of times, starting with the empty set, gives 
an important series of sets V,,, defined inductively as follows. 


Vo = 0, 
Vast = Vn UP(Vn). 


3.8.4 Prove that any subset of V,, is a member of V,,,;, and that if X € V,, then P(X) € V,41. 
We now define V,, to the union of all the V,,. 


3.8.5 Show that V,, satisfies the last two conditions for inaccessibility, but not the first. 
3.8.6 Give an example of a set that satisfies the first two conditions for inaccessibility, but not the 
last. 


3.9 Historical Remarks 


When Cantor discovered, in 1874, that infinite sets can be countable or uncountable, 
almost all previous thinking about infinity was superseded. For example, the vague 
and contentious distinction between “potential” and “actual” infinity was replaced 
by the clear and dramatic distinction between countable and uncountable. It seems 
academic to debate whether the positive integers 1,2,3,... should be viewed as 
a collection that grows, one member at a time, or as a completed whole N = 
{1,2,3,...}, once it is known that R cannot be viewed as a collection that grows 
one member at a time, because it is uncountable. 

The discovery of uncountability also brought new clarity to the concept of 
countability. Before 1874, few examples of countably infinite sets were actually 
known, apart from N and some of its subsets. It is thought that Cantor noticed 
the countability of Q some time before he proved the uncountability of R. In 
the intervening period, Cantor asked Dedekind whether he could prove that R is 
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Fig. 3.11 Georg Cantor 


countable. Dedekind was unable to do so, but he offered a proof that the algebraic 
numbers are countable (example 6 of Sect. 3.1). Ironically, Dedekind’s result 
became the centerpiece of Cantor’s paper on the uncountability of R. 

Just how this came about is explained by Ferreirds (1999), pp. 175-180. 
Apparently, Weierstrass persuaded Cantor to play down his uncountability proof 
in favor of its more topical corollary: the existence of transcendental numbers. Such 
numbers were first exhibited by Liouville (1851) and, just one year before Cantor’s 
discovery, Hermite (1873) had proved that e is transcendental. Compared with these, 
Cantor’s transcendence proof was remarkably simple—and to this day it is the most 
elementary proof known. This is why Cantor (1874) bears the (to us) unenlightening 
title “On a property of the collection of all real algebraic numbers.” 

Another theorem for which Dedekind deserves the credit is the so-called Cantor— 
Schréder—Bernstein theorem. Before any of these three had a proof, Dedekind found 
one in 1887, but omitted it (except for a key lemma) from the book, Dedekind 
(1888), he was then writing (see Ferreirds 1999, Chap. VII). For a long time, 
Dedekind had been interested in mappings between infinite sets, and in 1882 he 
proposed to define an infinite set as one that admits a bijection with a proper subset 
of itself (see Dedekind 1888, Sect. 64). It is indeed clear that a set admitting such a 
bijection must be infinite, but it is a more subtle question whether every infinite set 
admits such a bijection. We take up this issue in Sect. 7.1. 

The Bolzano—Weierstrass theorem of Sect. 3.6 gets its name because Bolzano 
(1817) used the bisection argument, and a sequence of nested intervals, in his 
attempt to prove the intermediate value theorem for continuous functions. As 
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Fig. 3.12 Henry John Stephen Smith 


we mentioned in Sect. 1.6, Bolzano’s proof lacked a definition of R that could 
justify any assumption of completeness, such as the nested interval principle. 
Weierstrass (1874) revisited the theorem after definitions of R had been proposed— 
by Dedekind, Cantor, and himself—at which time the nested interval argument was 
justifiable. 

The related Heine—Borel theorem likewise began with an argument, due to Heine 
(1872), aimed at a different theorem: in this case the theorem that a continuous 
function on a closed interval is uniformly continuous. (See Sects. 4.6 and 4.7 for 
this theorem and a discussion of uniform continuity.) Borel (1895) was the first to 
prove the theorem in its present form, and also the first to recognize its importance 
in measure theory. In particular, Borel saw that the Heine—Borel theorem justifies 
the measure argument that R is uncountable. 

Harnack (1885) had observed, as we did in Sect. 3.5, that any countable set could 
be covered by intervals of arbitrarily small total measure. But Harnack was puzzled 
by the example of the countable set Q. Thinking that a covering of Q by intervals 
would cover all points, he jumped to the conclusion that the whole interval [0,1] 
could be covered by open intervals of total length ¢. This of course plays havoc 
with the concept of measure, and fortunately the Heine—Borel theorem showed that 
Harnack was wrong. For this reason, Borel (1895) called the Heine—Borel theorem 
the “first fundamental theorem of measure theory.” (For more details on Harnack’s 
mistake, see Bressoud 2008, p. 63.) 

The so-called Cantor set is actually due to Smith (1875), shown in Fig. 3.12. 
However, the set plays so many important roles—in the theory of R, continuous 
functions, and measure theory—that credit for it can be shared among several 
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Fig. 3.13. Dali’s Face of War 


mathematicians. There are also higher-dimensional variations, such as the Sierpifiski 
carpet, and even artists have hit upon a similar idea. 

Figure 3.13 shows a Cantor-style set in the work of Salvador Dali. 

The Cantor (1874) argument for the uncountability of R constructs a real number 
x unequal to each member of a given sequence x), X2,%3,.... But it is not a 
“diagonal” argument in the sense of making x unequal to x, at a predetermined 
decimal place. An argument closer to diagonalization occurs in du Bois-Reymond 
(1875), but Cantor does not seem to have been influenced by it. In any case, the 
diagonal argument in Cantor (1891) is clearly simpler and more general. Even 
Cantor may have been surprised that it was so easy to prove that there was no largest 
set—a result that he had conjectured before but only on vague grounds. 

But with the proof came other concerns. If there is no largest set, there is no set of 
all sets. This may be a problem—but it also may be a useful clarification. Certainly, 
one can no longer suppose that, for each property P, there is the set of all objects 
with property P (let P be the property of being a set). For some, this was a “crisis 
of foundations”; for others (who ultimately prevailed) it was an argument for the 
cumulative or hierarchical concept of set. In the cumulative concept, all sets arise 
from the empty set @ by certain operations, just as natural numbers arise from 0 by 
the successor operation. From the cumulative viewpoint, it makes no more sense to 
have the “set of all sets” than it does to have the “number of all numbers.” 

In Chap. 6 we will see exactly how the cumulative concept of set unfolds, and in 
Chaps. 6 and 9 we will touch on the question of sets so large that their existence is 
not provable. 


Chapter 4 
Functions and Limits 


PREVIEW 


Now that we are familiar with the real numbers, we can better understand some basic 
concepts of analysis: limits, convergence, and continuity. In particular, the absence 
of gaps in R explains the absence of gaps in the graph of any continuous function. 
This, and other “obvious” properties of continuous functions, depends directly on 
the completeness of R. 

On the other hand, continuous functions sometimes have extremely surprising 
properties. We construct three examples: 


¢ A continuous function that increases from 0 to 1 while remaining “constant 
almost everywhere.” 

¢ Accurve with no tangents. 

¢ Accurve that fills a square. 


The last example raises the question: is there a continuous bijection between the line 
interval [0,1] and the square [0, 1] x [0, 1]? We show that the answer is no, as one 
would hope if the concept of dimension is to be meaningful. The key to the proof is 
the so-called intermediate value theorem about the absence of gaps in the graph of 
a continuous function. 

Finally, we explain why continuity ensures that a function has an integral. In fact, 
to integrate a continuous function one needs only the simplest concept of integral: 
the Riemann integral familiar from basic calculus. 


4.1 Convergence of Sequences and Series 


In Sect. 2.6 we touched on the concepts of convergence and limit for sequences; 
namely, a sequence cj, C2,c3,... has limit c if, for each number € > 0, there is a 
natural number N such that 


n>N => |cen-—cl<é. 
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This relation is written lim,—.. C, = c for short. One of the most important uses of 


the limit concept for sequences is in defining the sum of an infinite series: 


Definition. An infinite series a; + az + a3 +--- is said to converge to sum s if the 
sequence of partial sums 


ai, ayta2, ayt+art+a3, 


has limit s. 


A familiar example of a convergent infinite series is the geometric series 
2 33 
l+atat+a+--- for jal<1. 


This series converges because the nth partial sum is 


2 lfate + eeha™ 


for which we find 


alata ta aed +a": 


Subtraction then gives s, = toe , which has limit or when |q| < 1. 


Infinite decimals can also be viewed as infinite series, comparable with geometric 
series. For example 


V2 = 1.4142135--- 


Since all terms of this series are > 0, the partial sums s, increase with n. Also, they 
are bounded above by the geometric series 


9 9 9 9 9 


eee ere, 1 
10° 102° 10° 10° 1-1/10 


Thus, the series converges, to the lub of the set of its partial sums. 


4.1.1 Divergent and Conditionally Convergent Series 


For a series a, +a2+a3+:-+ to converge it is necessary that the nth term a, have limit 
zero, otherwise the partial sums will not become arbitrarily close to any limit value. 
However, it is not sufficient for a, to have limit zero, as is shown by the famous 
example of the harmonic series: 
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1 1 1 1 

~+tot—tite 

2 3 4 °5 
This series is said to diverge, because its partial sums grow in size indefinitely, as 
one can see by grouping terms as follows: 


Ppa al ee ee | oe. go + 
2 \3 4 5 6 7 8 9 10 16 : 


Each group has sum at least 1/2, so by taking enough groups we can make the partial 
sum as large as we please. 
Similar arguments lead to the conclusion that the series 


1 1 1 
hg gt and 


both diverge. This leads to an interesting property of the series 


1 a. ee ve Ds 
2 3 4 5 6 7 8 , 
called conditional convergence. 
The partial sums obtained by taking terms in the order shown above, namely, 


1 1 1 1 
1, l-=, 1l-=+-, I--= 
: 2 2 3 2 
fall inside a sequence of nested intervals 


ia: ieee Wetec eee el, 
2 8B o°3 4° 23 


the length of which becomes arbitrarily small (because the nth interval has length 
1/(n + 1)). The partial sums therefore have a limit by the nested interval property of 
Sect. 2.6, when summed in the above order.' 

But if we allow the terms to be rearranged, then the series can converge to any 
number we please! This comes about because we can collect enough of the positive 
terms 


'In fact, the sum is log 2, the natural logarithm of 2, as you may recall from calculus. 
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to exceed any number we please, and the same is true of the negative terms 


Thus, if we want the series to have the sum 3/2, for example, we do the following: 


leap | 


¢ Collect terms 1 . until their sum exceeds 3/2. This happens when we get 


2325006 
tol+it+¢. 
¢ Then add negative terms until the sum falls below 3/2. This happens as soon as 
-5 is added. 


¢ Resume adding positive terms until the sum exceeds 3/2 again. This happens with 
11. 151 1 
bt Pept beet. 
¢ Then resume adding negative terms (in this case just — +) until the sum falls below 
3/2 again, and so on. 


The partial sums thereby oscillate on either side of 3/2, but they approach it ever 
more closely, since the terms of the series become ever closer to 0. Therefore, the 
sum of the series is precisely 3/2. By a similar argument we can arrange terms so 
that the sum is any number we please. 


Exercises 


The proof that the harmonic series diverges relies on collecting groups of terms that sum to at least 
1/2. Each group of terms contains twice as many terms as the preceding group, so to exceed the 
sum k one needs around 2* terms. This estimate suggests that the sum of the first n terms is around 
the size of logn. We can show that this estimate is remarkably accurate by comparing the sum 
1+ 5 + 5 teeet : with the geometric interpretation of logn as an area under the curve y = 1/x. 
We compare the two as shown in Fig. 4.1. The natural logarithm log n is the area under y = 1/x 
from 1 ton, while 1 + 5 + 5 peeet 1 is the total area of a collection of rectangles. Each rectangle 


has width 1, and their heights are 1, 3, } : 


> J» Fo ++0y, Fespectively. 
4.1.1 By referring to Fig. 4.1, explain why logn < 1+ 5 + ; tere t i, 
4.1.2 With the help of a similar figure, explain why logn > 5 + 5 feet i. 


O 1 2 3 4 n n+l 


Fig. 4.1 Comparing the logarithm with the harmonic series 
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4.1.3 Deduce from the figures in Exercises 4.1.1 and 4.1.2 that the terms 


aie ey eae 
Ch = 713 a ogn 


form a bounded increasing sequence, with limit < 1. 


The limit of the sequence c,c2,c3,... is known as Euler’s constant y, and it is approximately 
0.57721. Although y has been much studied—Havil (2003) is a whole book about it—we do not 
yet know whether y is irrational. 

The ability of a conditionally convergent series to represent any real number gives an interesting 
proof that R is equinumerous with the set of permutations of N. (A permutation of N is bijection: 
N > N. Informally, a permutation is a “rearrangement.”) 


4.1.4 Use aconditionally convergent series to define an injection from R into {permutations of N}. 
4.1.5 Using binary sequences, say, define an injection from {permutations of N} into R. 
4.1.6 Hence prove that the set {permutations of N} is equinumerous with R. 


4.2 Limits and Continuity 


When we try to capture the notion of continuity—for example, to say what it means 
for a graph or a curve to be “unbroken”—we know that we have to fall back on the 
completeness of R. Completeness is related to limits, by Sect. 2.6, so it is no surprise 
that continuity involves the limit concept. The limit of a real function is defined as 
follows. 


Definition. A real function f is said to have limit | as x tends to a if, for each € > 0, 
there is ad > O such that 


0<|x-al<d6>lf()-I<e. 


This relationship can be expressed informally by saying that “f(x) approaches 
l as x approaches a,” and the notation for it is lim,_,, f(x) = /. The reason for the 
condition 0 < |x — a| is that we really want to express the behavior as x approaches 
a—not what happens when x = a. For example, the function 


_ {0 forx #0 
fete oo. 


has limit 0 as x approaches 0, even though f(0) # 0. We would not want to say, 
however, that this function is continuous at 0, since its graph has a clear break there. 
The definition of continuity says that the limit exists and that it equals the function 
value: 


Definition. A real function f is continuous at x = aif lim, f(x) = f(@, and f is 
continuous on a set S (typically S = R or some interval) if f is continuous at each 
point of S. 
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These definitions seem to capture our intuitive concept of continuity, inasmuch 
as the continuous functions include the functions that obviously have “unbroken” 
graphs, and they exclude functions with graphs that are obviously “broken” in some 
way. Here are some examples. 


1. Constant functions f(x) = c are continuous, as is the identity function f(x) = x. 
For a constant function f(x) = c we have 


Ix—al <6 > lf) - f@l<e 


for any 6 whatever, because in fact | f(x) — f(a@)| = |c — c| = 0. 
For the identity function f(x) = x it suffices to choose 6 = €, because then 


In-al <6 > |x-al<eS/f@-fal<s 


since f(x) = x and hence f(a) = a. 

2. If f; and fp are continuous functions, then so are f, + fo, fi — fo, fi : fo and (at 
points where fp # 0) fi/f2. Thus, it follows, from the previous example, that all 
rational functions are continuous at the points where they are defined. 

Here we will explain why f; + fo is continuous; the proofs for the other cases 
are indicated in the exercises. Suppose that f, and fs are both continuous at x = a. 
We want to prove that f; + 2 is also continuous there. This means proving, given 
€ > 0, that there is a 6 > O such that 


0<|x-al <6 > |i +h@-fi@—- fla <e. 
Now the continuity of f; gives us a 6; such that 
0<|x-al <6) > |i) - A@l < €/2, 
and the continuity of fs gives us a 62 such that 
0<|x—-al < 62 > |fi(x) — A@) < €/2. 
So if we take 6 = min(6,, 62) we have 


0<|x-al< 6 S>|fi®) -fi(@| <e/2 and |fo(x)— f(a) < €/2 
= IfiQ) + AQ) -— A@—- A@| < €/2 + €/2 =e, 


as required. 
3. The function f(x) = 1/x is not continuous at x = 0. 
No matter what value we give to f(0), f(x) has no limit at all as x 
approaches 0. When x approaches 0 from the right 1/x grows beyond all positive 
bounds, and from the left 1/x grows beyond all negative bounds. 
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4. The Dirichlet function 


oe 1 if x is rational 
~ | 0 if x is irrational. 


is not continuous at any point. 

This is because lim,_, f(x) does not exist at any point x = a. Any interval on 
the line contains both rational and irrational points, so |, f(x) — f(a)| varies by | in 
any interval, and hence cannot be made smaller than every ¢ > 0. 

It may seem that this function is as discontinuous as it can possibly be, but we 
will see in Chap. 8 that the Dirichlet function is, in a sense, not far removed from 
continuous functions. 


In some sense, continuous functions are the simplest functions, and discontinu- 
ous functions can indeed be very complicated. However, continuous functions can 
also have interesting complications, as we will see in the next section. 

To conclude this section we observe a consequence of continuity that involves 
limits of sequences rather than limits of functions. 


Sequential Continuity. [f f is continuous at a and a\,a2,a3,... is any sequence 
with limit a, then the sequence f(a,), f(d2), f(a3),... has limit f(a). 


Proof. Since f is continuous at x = a, for each € > 0 there is a 6 > 0 such that 
0<|x-al<é6=>|f(%) - f@| <e. 
Now, since dj, a2, 43,... has limit a, for this 6 > 0 there is an N such that 
n>N=>|a,-a\<6=> |f(a,) -f@l<e, 


and this means that the sequence f(a), f(a2), f(a3), ... has limit f(a). oO 


This result has, as a corollary, the property of continuous functions that we used 
in Sect. 3.4: 


Corollary 1. Any continuous function on R is determined by its values on the 
rational numbers. 


Proof. Any irrational number a is the limit of a sequence of rational numbers 


a1,42,a3,... . But then f(a) is determined by the values of f on the rational 
numbers, namely, f(a) = limy— oo f(an). oO 
Exercises 


4.2.1 By an argument like that used above to prove that f, + fo is continuous at x = aif f, and fy 
are, prove that f| — fs is also continuous at x = a. 
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Fig. 4.2. Graph of the Thomae function 


4.2.2 Use the identity 


AMAM - AML®@ = ACA - A@]) + A@AG - A@) 


to prove that, if f; and f are continuous at x = a, then so is f| fo. 

4.2.3 Prove that, if f is continuous at x = a and f(a) # 0, then 1/f is continuous at x = a. 

4.2.4 Deduce from the previous exercises that if f; and f are continuous at x = a, and if f2(a) # 0, 
then f\/f2 is continuous at x = a. 

4.2.5 Use sequential continuity to prove that f(g(x)) is continuous at x = aif f and g are. (The 
question whether sequential continuity implies continuity will be taken up in Sect. 7.1.) 
Also prove this assuming only ordinary continuity. 


Another interesting discontinuous function is the Thomae function (also known as the “popcorn 


function’), due to Thomae (1879) and defined by 


tye 1/q if x is rational and x = p/q in lowest terms 
~ 0 if xis irrational. 


Figure 4.2 shows an approximation to its graph. 
Despite its similarity to the Dirichlet function, the Thomae function is not discontinuous 
everywhere. 


4.2.6 Show that ¢(x) discontinuous at x = p/q but continuous elsewhere. 


4.3 Two Properties of Continuous Functions 


The suitability of our definition of continuous functions is confirmed by the 
following two theorems, which express properties of continuous functions that our 
intuition expects. 


Intermediate Value Theorem. /f f is continuous on interval [a, b], with f(a) < 0 
and f(b) > 0, then there is a value c € [a, b] with f(c) = 0. 
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Proof. Consider the set of points x for which f is negative “all the way up to x”: 
S = {xe [a,b]: f(y) < 0 for all y < x}. 


S is nonempty, because a is a member, and bounded above by b, so S has a least 
upper bound c. Now consider whether f(c) can be nonzero. 

If f(c) = € > O then, by continuity, there is a 6 > 0 with f(c — 6) > 0. Since 
c-—06€S, this contradicts the definition of S$. If f(c) = —e < 0 then there is similarly 
aod > Owith f(y) < 0 for y < c + 6, which again contradicts the definition of S$. 

Thus, the only possibility is f(c) = 0. oO 


The first to realize that the intermediate value property was provable was Bolzano 
(1817), and he also realized that the least upper bound property was the key to the 
proof. However, the least upper bound property was not available until Dedekind 
gave a precise definition of real numbers, by Dedekind cuts, in 1858. Once the 
least upper bound property was established, all the basic theorems about continuous 
functions became provable. The next theorem is another example. 


Extreme Value Theorem. A continuous function on a closed interval takes a 
maximum value (and, similarly, a minimum value). 


Proof. To create an opportunity to use the least upper bound property, we first prove 
that a continuous function f has a bounded set of values on the closed interval [a, b]. 

If not, repeatedly bisect the interval [a,b], each time choosing the leftmost half 
in which f has arbitrarily large values. In this way we obtain a sequence of closed 
intervals 1), Io, 15, ..., in each of which f is unbounded, yet the length of the intervals 
tends to zero. It follows that the intervals have a single common point, c € [a, b]. 
Then, by continuity, in a sufficiently small interval J, containing c, the value of f 
remains within distance ¢ of f(c)—so f is not unbounded on /,. 

This contradiction shows that the set { f(x) : x € [a, b]} is bounded, and so it has 
a least upper bound /. If f does not take the value /, then the function 1/(/ — f(x)) 
is continuous on [a,b]. But then 1/(/ — f(x)) is bounded, by the argument above, 
which contradicts the assumption that / — f(x) becomes arbitrarily small. 

The latter contradiction shows that f(x) takes the value /, which is necessarily its 
maximum value. Similarly, the continuous function — f(x) takes a maximum value 
m, in which case —m is the minimum value of f(x). oO 


4.3.1 The Devil’s Staircase 


In this subsection we discuss a function that throws new light on the intermediate 
value theorem. It is a function that takes all values between 0 and 1, while “almost 
never” changing value. For this function, both continuity and the intermediate value 
property verge on the miraculous. 
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Fig. 4.3 Stage 5 in the construction of the Devil’s staircase 


The Devil’s staircase is the graph of a continuous function F that is constant 
on each interval in the complement of the Cantor set. F is constructed in stages as 
follows: 


Stage 1. Let F(O) = 0, F(1) = 1, and F(x) = 1/2 on the interval (1/3, 2/3). 

Stage 2. Let F(x) = 1/4 on the interval (1/9, 2/9), and let F(x) = 3/4 on the 
interval (7/9, 8/9). 

Stage n. On the middle third of each interval (a, b) on which F is still undefined, 
but such that F(a) and F(b) are defined, let F take the value halfway between 
F(a) and F(b). 


Figure 4.3 shows the graph of the function F at Stage 5. 

After all finite stages are completed (so F is defined on all intervals in the 
complement of the Cantor set), any x for which F(x) is still undefined will be 
arbitrarily close to points for which F(x) is defined. Moreover, the difference 
between defined values F(a), F(b) for a < x < b becomes arbitrarily small as a 
and b approach x, so we can define F(x) uniquely as the limit of the values F(a) as 
ax. 

It follows that F is a continuous function with values including all the binary 
fractions m/2” between 0 and 1, and all the limit points of these fractions—that is, 
all the real numbers between 0 and 1. In fact, the values of F on the Cantor set 
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itself include all the binary fractions (as values at the endpoints of intervals in the 
complement) and their limit points, so F even maps the Cantor set continuously 
onto [0,1]. 


Exercises 


A simple consequence of the intermediate value theorem is the following special case (the one- 
dimensional case) of the famous Brouwer fixed-point theorem: Any continuous map f : [0,1] > 
[0, 1] has a fixed point; that is, a value c € [0,1] such that f(c) = c. 


4.3.1 Show that any continuous f : [0,1] — [0, 1] has a fixed point, by considering intermediate 
values of a suitable function. 

4.3.2 Give an example to show that a continuous function on an open interval need not have 
extreme values. 


An explicit continuous map f of C onto [0,1] may be described as follows. Recall from the 
exercises to Sect. 3.7 that each x € C has a unique ternary expansion using only the digits 0 and 
2. Let f send this x to the number whose binary expansion has | in each place where the ternary 
expansion of x has a 2. 


4.3.3 Explain why f is onto [0,1]. 
4.3.4 Show that f(x) and f(x’) differ by less than 2~” if x and x’ differ by less than 3~”, so that f 
is continuous. 


4.4 Curves 


We define a curve (strictly, a curve with endpoints, which may be identical), in the 
plane R? say, to be a continuous function f : [0,1] > R*. This formalizes the idea 
of tracing the curve by moving a point along it in a unit time interval: f(£) is the 
position of the point at time ¢. It might be thought sufficient to take the range of the 
function f to be the curve. An example that explains why this is not sufficient is the 
second one in this section: a continuous curve that fills a square. 

Our first example is another that debunks a long-held belief about curves: that 
they always have tangents. 


4.4.1 A Curve Without Tangents 


Helge von Koch (1904) gave a lovely example of a curve without tangents, obtained 
as the limit of the sequence of polygons shown in Fig. 4.4. 

We will not give a rigorous proof that the Koch curve has no tangents, but 
rather encourage the reader to visualize a process of repeated magnification of 
the curve. Unlike a smooth curve, which looks more and more like a straight 
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Fig. 4.4 The Koch polygon sequence 


line under magnification, any portion of the Koch curve looks exactly the same 
when magnified by 3. If the Koch curve had tangents, magnification would 
show it becoming straighter in the neighborhood of any point where a tangent 
exists. 


4.4.2 A Space-Filling Curve 


Following Peano (1890), we prove that such a curve exists by describing how to 
move a point continuously through the square in a unit time interval, so that each 
point in the square is visited at some time between 0 and 1. 


Peano’s space-filling curve. There is a continuous surjection f of the unit interval 
[0, 1] onto the unit square [0, 1] x [0, 1]. 


Proof. Intuitively speaking, f(t) is the position at time tf of a continuously moving 
point, beginning at the bottom left corner (0, 0) of the square at time 0, and ending 
at the top right corner (1, 1) at time 1. Thus, a first approximation to the curve is a 
line segment from (0, 0) to (1, 1). 

We refine the approximation by dividing the square into nine equal subsquares, 
constraining the moving point to spend 1/9 of the unit time interval in each, and 
traveling from one corner to its opposite in the order shown in Fig. 4.5. Inside the 
corners in question we show the time at which the moving point arrives. (Notice that 
certain corners are visited more than once—this is unavoidable.) 

We repeat this process in each subsquare, dividing it into nine equal squares and 
visiting them in zigzag order in nine equal time intervals, and so on. It follows 
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I 3/9 | 3/9 ai 
2/9 4/9 | 8/9 
2/9 4/9 | 8/9 

1/9 | 5/9 719 

1/9 | 5/9 719 
0° 0° 6/9 | 6/9 


Fig. 4.5 Refining an approximation to the Peano curve 


that, for any point P in the square, the moving point visits P at some time f¢. If 
P is the limit point of a nested sequence of subsquares, then f is the limit of the 
corresponding nested sequence of time intervals. 

The function 


f@ = position of the moving point at time ¢ 


is therefore a surjection of [0,1] onto the unit square. It is also clear that f is 
continuous for each t € [0,1]. Given any ¢ > 0, we can ensure that f(t’) is within 
é of f(t) = P by finding a subsquare that contains P and is small enough that all its 
points are within distance ¢ of P. Then if?’ lies within the corresponding subinterval 
of [0, 1] we have 


IfO-fOl<e. 
If ¢ is not on the boundary of the subinterval, call it J, we take 
6 = distance from ¢ to the nearest end of J 
to ensure that 
It-f|<d > |fO-fM|<e. 


If ¢ lies on the boundary of two subintervals [;, 2 we take 6 to be the minimum of 
their lengths, and it is again true that 


It- 11 <6 > |fO-fOl<e, 


because the moving point travels through subsquares of equal size in equal 
subintervals of time, and we have already arranged that f(t) moves distance less 
than ¢ from P in the time intervals in question. oO 
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Exercises 


4.4.1 Prove that the Koch curve has infinite length, by showing that the lengths of the polygons 
tending to it grow without bound. 

4.4.2 Show in fact that the Koch curve has infinite length between any two of its points. 

4.4.3 Similarly prove that the Peano curve has infinite length between any two of its points. 

4.4.4 Give an example of a function f whose graph y = f(x) for 0 < x < | has a tangent at every 
point and infinite length. 

4.4.5 Show, however, that the curve y = f(x) in Exercise 4.4.4 has finite length between any two 
of its points. 


4.5 Homeomorphisms 


In Sect. 3.4.1 we constructed a bijection between the line R and the plane RXR, and 
it is not hard to modify it so as to obtain a bijection between the line segment [0, 1] 
and the square [0, 1] x [0, 1]. These bijections are not continuous, but in Sect. 4.4 
we have seen a continuous map of [0, 1] onto [0, 1] x [0, 1]. This raises the question 
whether there is a continuous bijection between [0, 1] and [0, 1] x [0, 1], or between 
the line and the plane. Such a bijection would be truly disturbing, because it would 
say that there is essentially no difference between one dimension and two, so the 
concept of dimension would have no meaning. 

The concept of dimension was saved when Brouwer (1911) proved the following 
theorem on invariance of dimension: when m # n there is no bijection between R” 
and R” that is continuous in both directions. A bijection that is continuous in both 
directions is called a homeomorphism, and the study of properties that are invariant 
under homeomorphisms is the subject of topology. Obviously, topology overlaps 
with the study of real numbers and continuous functions. But it is a big subject, 
and we cannot go far into it here. We will be content to prove the simplest case of 
invariance of dimension: 


Distinctness of dimensions one and two. There is no continuous bijection between 
R and R’. 


Proof. We use a property of R that it does not share with R x R; namely, R can be 
separated by a point. To be precise, if we remove the point 0, the points 1 and —1 in 
the resulting set R — {0} cannot be joined by a continuous path. Indeed, a continuous 
path from —1 to | is a continuous function f : [0,1] — R—- {0} with f(0) = —1 and 
fC) = 1. Such a function fails to satisfy the intermediate value theorem, because it 
does not take the value 0. 

Now if there is a bijection g : R — R’, continuous in both directions, consider 
the points g(—1), g(0), and g(1). These are three distinct points in the plane, so there 
is a continuous path from g(—1) to g(1) that does not meet g(0). If we transport this 
path back to R by the continuous bijection g~! we get a continuous path from —1 to 
1 not meeting 0. 
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As we have just seen, such a path is impossible in R, so the continuous bijection 
g does not exist. oO 


Exercises 


In a striking contrast to the topological distinctness of R and R x R, there is no such distinction 
between C and C Xx C, where C is the Cantor set. Given any x € C, we separate its ternary 
expansion into the sequence x, of odd-position digits and the sequence x7 of even-position digits. 
For example, if 


x = (0.022022022022...)3 
then 

x,» =(00 22022 ...)3 
and 


x» =(0.202202...s 


4.5.1 Explain why the map x # (x1, x2) is a bijection of C onto C x C. 
4.5.2 Also explain why the map x } (x1, x2) is continuous and has a continuous inverse. 


It is not always the case that a continuous bijection has a continuous inverse. 


4.5.3, Give an example of a bijection between [0,1) and the circle that is continuous in one 
direction but not in the other. 


4.6 Uniform Convergence 


The continuous Peano function f of Sect. 4.4 can be viewed as the limit of a 
sequence of very simple continuous functions f|, fo, fs,.... Each f; maps the unit 
interval into a polygonal path through the unit square, zigzagging through the 
diagonals of subsquares of width 1/3. 

The sequence fj), fo, f3,... not only converges to f, it converges uniformly in the 
following sense: 


Definition. Functions f, converge uniformly to f if, for any € > 0, we can find an 
N such that 
n>N=>(|f,()-f@O|<e forallt. 


In other words, f,(t) and f(t) differ by less than ¢ at all points t. 


This property is clear for the functions f, that converge to the Peano curve. If 
n > N, then the polygonal path defined by f,, is a refinement of the path defined 
by fy, and it falls within the same zigzag sequence of squares traversed by fy. 
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-I/n 1/n 


Fig. 4.6 The spike function 


Therefore, f, and f differ from each other by at most the diameter of these squares, 
which can be made as small as we please by choosing WN sufficiently large. 

This idea gives the following criterion for the limit of a sequence of continuous 
functions to be continuous. 


Uniform Convergence Criterion. /f fi, fo, f3,... is a uniformly convergent 
sequence of continuous functions on an interval [a, b], then 


FOX) = Tim fu(2) 


is also continuous. 


Proof. For eachc € [a, b] we wish to show that lim,_,. f(x) = f(c). That is, for each 
€ > 0 we seek a 6 > 0 such that 


Ix— cl <6 = |fQ)- fol <e. 


We can do this by finding an N and a 6 for which the following three conditions 
hold simultaneously: 


1. | f(x) - fx(®)| < €/3 forn > N, 
2. |fn(x) — fn(c)| < €/3 for |x — cl < 6, 
3. |frlc) — f(o)| < €/3 forn > N. 


Conditions 1 and 3 can be met simultaneously by uniform convergence of the 
sequence fi, fo, f3,.... Condition 2 can be met, by the continuity of f,, for some 
6 depending on N and c. 

So, if we first choose N to meet conditions 1 and 3, then choose 6 to meet 
condition 2, we have a 6 such that |x — c| < 6 => |f(x) — f(c)| < €, as required. Oo 


Without the condition of uniform convergence a convergent sequence of continu- 
ous functions may have a discontinuous limit. For example, let f,(x) be the function 
with the spike-shaped graph shown in Fig. 4.6. 
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That is, 


0 ifx<-I/n 
nx + 1 if -—I/n<x<0 
Tn(X) = 


—nx+1 if0<x<I1/n 
0 if x > 1/n. 


It is clear, since the spike becomes arbitrarily thin as n increases, that 


0 ifx<0O 
f@M=Ilimfi@®=41 ifx=0 
0 ifx>0, 


which is discontinuous at x = 0. The discontinuity arises from the nonuniform 
convergence of the f,, to f. In particular, there is no N for which 


n>N = |f,(x) — f(x) < 1/2 for all x, 


because we always have |f,(1/2n) — f(1/2n)| = 1/2. 


Exercises 


4.6.1 If g1,92,93,... are the continuous functions defining the first, second, third, ... polygonal 
approximations to the Koch curve given in Sect. 4.4, show that the sequence g;, g2,g3,... 
converges uniformly. 

4.6.2. Give an example of a function with infinitely many discontinuities that is the (nonuniform) 
limit of a sequence of continuous functions. 


4.7 Uniform Continuity 


Any continuous function on a closed interval [a,b] is actually continuous in the 
“uniform” sense exhibited by the Peano curve: the variation of f(x) can be kept 
within ¢ by keeping the variation of x within some 6 which does not depend on x. 
The formal definition of uniform continuity can be stated as follows. 


Definition. A function f is uniformly continuous on a set S if, for any ¢ > 0 and 
any x,y € S, there is a 6 > 0 such that 


Ix-yl <6 > |fQ) -fWl<e. 
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We notice that a uniformly continuous function is continuous at each point of 
c € S. Because, if we fix y = c, we have 


Ix—e] <6 = |fQ)- fOl<e, 


so limye f(x) = f(c). 

However, continuity does not always imply uniform continuity. The function 
f(x) = 1/x is continuous on the set S$ = (0, 1) but not uniformly continuous, because 
we can have |f(x) — f(y)| = 1 while |x — y| is as small as we please, by choosing x 
and y sufficiently small. 

The concepts of continuity and uniform continuity agree on closed intervals S , 
thanks to the following theorem. 


Uniform continuity on closed intervals. A continuous function on a closed 
interval is uniformly continuous. 


Proof. Suppose f is continuous on [a, b]. Since f is continuous at each c € [a, b], 
for each € > 0 there is a 6(c) such that 


lx — cl < d(c) = |f(%) — fo) < €/2. 
It follows that 
xy € (c- d(c),c + d(c)) = |f(x) - fy) < e, 


because | f(x) — f(c)| < €/2 and | f(y) — f(c)| < €/2. Thus, each c € [a, b] lies in an 
open interval U with the property that 


xyEeU = |f(x)-fYl<e. (*) 

The set of all open intervals U with property (*) therefore covers [a, b]. It follows, 
by the Heine—Borel theorem of Sect. 3.6, that [a, b] is covered by finitely many open 
intervals U;, Uz, ..., U,, each with property (*). 

Let c) < cz < +++ < Cy be the endpoints of U;, U2,..., U, lying between a and b. 
Also let a = cg and b = Cy41. Since each c; € some Ux, the open intervals on either 
side of c;, (cj_1, c;) and (c;, Ci+1), are contained in U,. So if we let 

6 = minimum length among the intervals (c;, ci+1), 
we have 


Ix-y|\<d=> x,y € same U; => |f(x) - fWY)|<e, 


as required to show that f is uniformly continuous on [a, b]. oO 
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This proof also has the following more general consequence, for the compact 
sets introduced in Sect. 3.6. The consequence will become useful when we obtain a 
clearer view of compact sets in Sect. 5.4. 


Corollary 2. A continuous function on a compact set is uniformly continuous. 


Proof. Suppose f is continuous on a compact set K. For each x9 € K and each e > 0 
continuity implies there is a 6 such that 


|x — xo] < 6 = | f(x) — f(%0)| < €/2. 


Taking all such xp and 6, we get a covering of K by the open intervals (xo — 6, x9 +6). 
By compactness, finitely many of these intervals also cover K. We can then argue as 
in the proof above. oO 


Exercises 


Since a curve with endpoints is a continuous function on a compact set, namely [0,1], we can 
deduce some general properties of such curves from uniform continuity. 


4.7.1 For any curve in the plane f : [0.1] — R?* and any e > O show that there are values 
ti, t2,...,t, € [0, 1] such that the section of the curve between f(t;) and f(ti+1) lies within a 
circle of radius «. 

4.7.2 Deduce from Exercise 4.7.1 that any curve with endpoints is the uniform limit of a sequence 
of polygons. 


Suppose we want to define curves without endpoints, in order to cover curves such as the 
parabola. 


4.7.3 Propose a suitable definition, involving the concept of a continuous function. 
4.7.4 Illustrate your definition in the case of the parabola. 


4.8 The Riemann Integral 


The concept of uniform continuity fits like a glove onto the concept of Riemann 


integral used in basic calculus. Recall that the definition of f 4 F(x) dx involves the 
following setup, as shown in Fig. 4.7: 


1. A closed interval [a, b] on which f(x) is defined. 

2. A division of [a, b] by finitely many points c; with a < c) < cz < +++ < Cm <b. 

3. Lower and upper approximations to f by step functions, constant on each interval 
(cj, Ci+1). The lower approximation equals the minimum m; of f on each interval, 
and the upper approximation equals the maximum, Mj. 

4. Lower and upper approximations to the integral, >); m;(cj+; —c;) and )}; Mi(ci+1 - 
c;), called Riemann sums. 
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5. The integral exists if the difference between the two Riemann sums (made of 
rectangles like the one shown shaded in Fig. 4.7) can be made arbitrarily small. 


A uniformly continuous f is tailor-made for this setup, because we can obtain 
finitely many intervals on which the difference between the minimum and maximum 
of f is less then e. Consequently the difference in area between the upper and lower 
approximations, is less than (b—a)e, which can be made arbitrarily small. Therefore, 
since a continuous function on a closed interval is uniformly continuous, we have 
the theorem: 


Integrability of continuous functions. [f f is continuous on [a, b] then f f(x) dx 
exists. oO 


However, there is not a perfect match between continuous functions and Riemann 
integrable functions, because certain discontinuous functions are also Riemann 
integrable. An easy example is the function 


es 

0 ifx #0, 

the integral of which equals 0 on any interval. This is because the lower approxima- 
tion is the constant zero function, and the upper approximation can be taken to be 
zero everywhere except on an arbitrarily small interval (—e, €). 

In fact, a function with a dense set of discontinuities can be Riemann integrable 
(see exercises), though not all such functions are. The Dirichlet function mentioned 
in Sect. 4.2 is not Riemann integrable. Because of this, a more general concept of 
integrability is desirable. The best-known general integral, the Lebesgue integral, is 
based on a general concept of measure, which will be discussed in Chap. 9. One of 


m 


O a Ci Ci+1 b 


Fig. 4.7 Setup for the Riemann integral 
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the beauties of Lebesgue measure, besides its generality, is that it gives a precise 
criterion for a bounded function to be Riemann integrable. Namely, a bounded f 
is Riemann integrable if and only if f is continuous everywhere except on a set of 
measure zero. We prove this result in Sect. 9.5. 


4.8.1 The Fundamental Theorem of Calculus 


The fundamental theorem of calculus, roughly speaking, states that the derivative 
of the integral of a function f equals f. Various versions of the theorem exist, 
depending on the nature of the integral and the functions f. The simplest version, 
which we will now prove, concerns the Riemann integral of a continuous function f. 


Fundamental theorem of calculus. /f f is continuous and F(x) is the Riemann 
integral [” f(t) dt, then F’(x) = f(x). 


Proof. By the definition of derivative, 


F(x) = lim F(x+h)- F(x) 
n>0 h 


o£" pyar — f* fat 
= lim eS 


h-0 h 
x+h 
= lim , f(t) dt. 


Now it follows from the definition of Riemann integral that 


xth 
hm< { f@ dt < hM, 


where m and M are the minimum and maximum of f(t) for h € [x,x + hl]. 
Consequently, 


1 x+h 
mea f@adt< M, 


and as h > 0 both m, M — f(x) by the continuity of f. Thus, 


1 x+h 
F'(a) = lim > { f(t)dt = f(x). o 
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O 1 a b ab 


Fig. 4.8 Riemann sums from | to a and from b to ab 


Exercises 


The defining property of the logarithm, logab = loga + logb, can be proved directly from the 
definition of log c as a Riemannn integral. We define 


\ © dx 
ogc = — 
g Lox 


and consider the integral of 1/x from | to a and from b to ab. This amounts to comparing Riemann 


sums for the two areas shown under the curve in Fig. 4.8. We use Riemann sums obtained by 
dividing both [0, a] and [b, ab] into n equal parts (Fig. 4.8 shows the case n = 4). 


4.8.1 Show that the rectangles in the Riemann sums from | to a have exactly the same areas as 
their counterparts in the Riemann sums from b to ab. 
4.8.2 Deduce from Exercise 4.8.1 that . a = 1 a 
4.8.3 Deduce from Exercise 4.8.2 that log ab = loga + log b. 
Now we show that the Thomae function ¢(x) defined in the exercises to Sect. 4.2, despite its 
many discontinuities, is Riemann integrable. The proof depends on the fact that unequal intervals 
are allowed. 


4.8.4 By subdividing [0,1] into unequal subintervals in such a way that “large” values of #(x) 
are enclosed in “narrow” subintervals, show that the Riemann sums for f(x) can be made 
arbitrarily small. 


4.8.5 Deduce that [.' r(x) dx = 0. 
4.8.6 Show, on the other hand, that f(x) does not satisfy the fundamental theorem of calculus. 
Namely, if F(x) = iN t(x) dx then F’(x) = t(x) only for irrational x. 


4.9 Historical Remarks 


From its beginnings in the seventeenth century, calculus was supposed to deal 
with continuous phenomena. For Newton, in particular, the basic phenomenon was 
continuous motion, and the basic problems were the following two: 


4.9 Historical Remarks 107 


1. Given the length of space continuously (that is, at every time), to find the speed of motion 
at any time proposed. 

2. Given the speed of motion continuously, to find the length of space described at any time 
proposed. 


Newton (1671), p. 71. 


In our language, these are the problems of differentiating and integrating continuous 
functions. In Problem | we are given distance d(t) and have to find the speed d’(f). 
In Problem 2 we are given the speed v(t) and have to find the distance traveled by 


: T : . : : : 
time T, f v(t) dt. The mental picture of continuous motion makes it plausible that 


d'(t) exists for any continuous distance function d(t), and that ft u(t) dt exists for 
any continuous speed function u(t). But, as we now know, only the second of these 
statements is true. 

In fact, as calculus evolved, the notion of continuous function expanded, from 
being identical with the notion of differentiable function until it included functions 
that are nowhere differentiable. The shift in meaning was partly due to expansion of 
the function concept, and partly due to the tardy development of the limit concept, 
without which a precise definition of continuity was not possible. 

As we saw in Sects. 1.5 and 1.9, functions were originally dependent on 
formulas, and the first formulas considered were indeed differentiable. The concept 
of “formula” expanded with the discovery of Fourier series, which could express 
functions that were clearly not differentiable at all points. Take the triangular wave 
function of Sect. 1.5, for example, which clearly has infinitely many “corners” at 
which no tangent exists. 

Bolzano (1817) first gave a definition via continuity at a point, essentially as we 
do today, and Cauchy (1821) rediscovered the concept in the first comprehensive 
and rigorous course on analysis. Cauchy’s course included precise concepts of limit, 
convergence of series, and continuity, and also a concept of integral that enabled 
him to prove that every continuous function (on a closed interval) is integrable. At 
this time it was still believed that continuous functions are differentiable, except 
perhaps at a “few” exceptional points. The incentive to study more “pathological” 
continuous functions came from the theory of Fourier series. 

Fourier (1822) discovered that, under certain conditions, a function f could be 
expressed in the form 


1 = ; 
f(x) = a VG cosnx + b, sinnx), 


n=1 


where the coefficients are the integrals 


1 (” 1 (" 
an = — { f(x)cosnxdx and b,=- A F(x) sin nx dx. 
T Jen N Jn 


The conditions for these formulas to be valid were not clear, and Dirichlet (1829) 
was the first to prove a general theorem. He showed that the formulas for a, and 
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Fig. 4.9 Augustin-Louis Cauchy and Peter Gustav Lejeune Dirichlet 


Fig. 4.10 A continuous function with infinitely many oscillations 


b, are valid provided that f on (—7, 2) is continuous and piecewise monotonic. The 
latter condition means that (—7, 7) can be divided into finitely many subintervals, on 
each of which f is either nondecreasing or nonincreasing. 

Thus, the validity of Fourier series was not yet proved for continuous functions 
with infinitely many oscillations, such as f(x) = x sin 1 for x > 0 and f(x) = 0 for 
x = 0. (Fig. 4.10). 

This led to the investigation of wildly oscillating continuous functions, and 
eventually to continuous counterexamples to Fourier’s formulas. More importantly, 
it led to the discovery of nowhere differentiable functions by Weierstrass (1872). 
The first examples were based on infinite sums of trigonometric functions and 
were hard to visualize [though see Hairer and Wanner (1996), p. 265, for an 
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Fig. 4.11 Giuseppe Peano and Helge von Koch 


Fig. 4.12 Example of a simple closed curve 
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understandable example]. The von Koch (1904) example became famous because 
of its visual appeal. 

The Koch curve and the Peano curve showed how far the notion of continuous 
curve had evolved since the seventeenth century. By the end of the nineteenth 
century one had a simple and general definition of a continuous curve, but the 
definition covered examples more pathological (and interesting?) than originally 
intended. Still, the definition passed one important test for continuous curves: the 
Jordan curve theorem. This theorem, formulated by Jordan (1887) states that a 
simple (that is, not self-intersecting) closed curve separates the plane into two 
regions (the “inside” and “outside” of the curve). The Jordan curve theorem is 
correct, but hard to prove. The proof by Jordan was considered suspect by his 
successors, but was declared essentially correct by Hales (2007), who produced the 
first computer-checkable proof of the theorem. 

Some beautiful examples of simple closed curves (which hint at why the Jordan 
curve theorem may be hard to prove) have recently been given in the field of TSP 
art. See, for example, the web site of Robert Bosch 


www.oberlin.edu/math/faculty/bosch/tspart-page.html 


Figure 4.12 is one of his images. 


Chapter 5 
Open Sets and Continuity 


PREVIEW 


In this chapter we shift our attention from functions back to sets. The shift is 
prompted by the fact that continuous functions have a natural description in terms 
of sets: the so-called open sets. Just as continuous functions may be viewed as 
the simplest functions, open sets may be viewed as the simplest sets. And just as 
complicated functions arise from continuous functions by the limit process, compli- 
cated sets arise from open sets by certain operations, namely, complementation and 
countable union. 

There are in fact parallel classifications of functions and sets into levels of 
complexity, called the Baire hierarchy of functions and the Borel hierarchy of sets. 
Both are useful, and they interact usefully with each other, but sets are a little easier 
to work with. We study the classification of Borel sets in depth in Chap. 8. 

Here we begin studying the classification by looking at open sets and their 
complements, the closed sets. After describing open sets and their relationship with 
continuous functions, we introduce closed sets and focus on a particular type, the 
perfect sets. An example of a perfect set is the Cantor set C, and we show that every 
perfect set resembles C in a certain sense. 

Finally, we lay the foundation for an orderly construction of the Borel sets by 
constructing a universal open set—an open set in R? whose horizontal sections 
are precisely the open sets in R. More precisely, we do this with a convenient 
replacement for R, the set N of all functions f : N — N. N can be viewed as 
the set of irrational numbers in (0,1), and it avoids problems of ambiguity caused by 
the rational numbers. 


5.1 Open Sets 


An open interval on the line R is a special case of the concept of open set. The 
concept of open set is actually very general but for our purposes a set U is open if, 
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(x2, y2) = U2 


Yr2— Yi 


uy = (%1, 41) 
X2— X1 


O 


Fig. 5.1 Distance in R? 


Fig. 5.2. An e-neighborhood in R? 


for each point u € U, the points within some positive distance ¢ of u also belong 
to U. Thus, our concept of open set applies to spaces with a concept of distance; 
typically the n-dimensional Euclidean spaces R”. 

For example, in the plane R?, the distance from point uj = (x;,y1) to point 
U2 = (X2, Y2) 18 given by 


lux -— um) = f0Q2- 1)? +Y2- yi), 


thanks to the Pythagorean theorem; see Fig. 5.1. The set of points at distance less 
than ¢ from u € R? is an open disk of radius ¢ and center u, shown in Fig. 5.2. 

(The boundary circle is drawn dashed to indicate that it does not belong to the 
neighborhood.) 

This concept of distance easily generalizes to R”, where the set of points at 
distance less than ¢ from a point u, N.(u) = {v € R” : |v— ul < é}, is called an 
open n-ball of radius ¢€ or the é-neighborhood of u. 

Given the concept of e-neighborhood, we can define the concept of an open set 
as follows. 


Definition. A set U C R” is called open if, for each point u € U, there is an e > 0 
for which N,(u) is contained in U. 
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The basic properties of open sets follow easily from this definition. 


1. The empty set (trivially) and the whole space R” are open sets. 
2. The union of any collection of open sets is open. 
3. The intersection U; N U2 of two open sets U; and U2 is open. 

Because, if u € U; NM U2 there is an € for which N,,(u) C U; and an e2 
for which N,,(u) C U2. So if we take € = min(e),€2) we have ¢ for which 
N,(u) G U1 OM Ud, as required. 

4. Each open U set is the union of open n-balls. 

In particular, U is the union of the set of n-balls N.(u) for the u € U and the ¢ 
(depending on w) for which N,(u) € U. 

5. In fact, each open set U is the union of countably many n-balls. 

It suffices to take n-balls of rational radius centered on points u with rational 
coordinates. Any point v € U has a neighborhood N,(v) € U, and inside N,(v) 


we can take u with rational coordinates (r),r2,..., 7,) so close to v that v € N,(u) 
for a rational r small enough to ensure that N.(u) € N,(v) € U. 
Then, as we know from Sect. 3.1, the set of all (7 + 1)-tuples (7,71, 72,..., 7m) 


of members of a countable set is countable. 
6. It follows from property 5 that the set of open sets U C R” is equinumerous 
with R. 
This follows from the result of Sect. 3.3 that the set of subsets of a countable 
set (which we took to be subsets of N in that section) is equinumerous with R. 


Even in R, open sets can be quite complex and interesting. An example is the 
complement! [0, 1] — C of the Cantor set in the unit interval, introduced in Sect. 3.7. 
In this case we can explicitly list countably many open intervals whose union is 
[0, 1] -C, namely: 


and so on. 


Exercises 


The open disks or balls are often called basic open sets because all open sets are obtainable from 
them as unions. In R” for n > 1 another useful family of basic open sets are the cartesian products 
of n intervals. For example, in R? these cartesian products are open rectangles 


(a, b) x (c,d) = {((x,y) :a<x<bandc <y< d}. 


‘Remember that in this book we use the ordinary minus sign to denote set difference. 
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These basic sets are sometimes more convenient (e.g., in measure theory—see Chap. 9), so it is 
worth checking that they give the same open sets as the open disks. 


5.1.1 Explain why any open rectangle is a union of open disks. 
5.1.2. Explain why any open disk is a union of open rectangles. 
5.1.3 Explain why any open set U € R? is a countable union of open rectangles. 


The complement of an open set is generally not open. 


5.1.4 Explain why C is not open. 
5.1.5 Show that R and the empty set are the only open subsets of R with open complements. 


5.2 Continuity via Open Sets 


When open sets are defined by e-neighborhoods, as in the previous section, they are 
very naturally aligned with the concept of continuous function. In fact, they enable 
us to define the concept of a continuous function globally, without recourse to the 
preliminary definition of “continuity at a point.” 


Continuity in terms of open sets. A function f : R — R is continuous if and only 
if f-'(U) is open for each open set U, where 
f= {xeR: f(x) € U}. 


Proof. By the e-6 definition of continuity, for each a € R and each € > 0 there is a 
6 > 0 such that 


Ix-al<6=>|fQ)- f(@|<e. 
In terms of neighborhoods, this says 


x € Ns(a) > f(x) € Nef). 


And in terms of f~!: for each a € R and € > 0 there is a 6 with 
Ns(a) © f-'(Ne(f(@))). (*) 


Now if U is any open set, and if a € f~'(U), then f(a) € U. Since U is open, 
it contains some N,(f(a)), and so f~'(N,(f(a)) contains N5(a). In other words, if 
aé f-'(U), then f-'(U) also contains some Ns(a). That is, f-'(U) is open. 

Conversely, if any f~'(open) is open, then f~!(N,(f(a)) is an open set that 
includes a, and hence some N,(a). This gives us a 6 for each «. oO 


The e-6 definition of continuity applies to functions f : R” — R", if we interpret 
|x—al as the distance between x and a in R”, and | f(x)—f(q@)| as the distance between 
F(x) and f(a) in R”. Then the argument given above shows that f : R” — R” is 
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continuous if and only if f~!(U) is open (as a subset of R”) for each open set U 
in R". 


5.2.1 The General Concept of Open Set 


In this book we are concerned only with spaces in which a concept of distance may 
be defined, and hence with open sets and continuous functions defined by means of 
é-neighborhoods. However, now that we have seen continuity defined in terms of 
open sets, it is worth mentioning that open sets can be defined without use of the 
concept of distance. The trick is to use the first three properties of open sets, given 
in Sect. 5.1, as a definition. 

Suppose that we have a set X and acollection7 of sets U € X with the following 
properties. 


1. The empty set and X itself are members of 7. 
2. The union of any members of 7 is a member of 7. 
3. The intersection of any two members of J is a member of 7. 


Then the sets U € F are called open subsets of X, and 7 is called a topology 
on X. Using this concept of an open set, we can talk about continuous functions, 
homeomorphisms, and so on, without depending on a concept of distance. Any other 
concepts that can be defined in terms of open sets (such as closed sets and compact 
sets—see below) are also meaningful in this general theory of open sets, which is 
called general topology. 


Exercises 


The open set definition of continuity avoids having to define continuity at a point, but it is also easy 
to define continuity at a point in terms of open sets. 


5.2.1 Express the continuity of f at point a in terms of open sets containing the point f(a). 


Any set S C R has an obvious topology, called the relative topology, whose open sets are 
precisely the sets S M U, where U is an open subset of R. 


5.2.2 Check that the sets S M U satisfy the three conditions for a topology. 
5.2.3. Show that the relative topology on [0,1] has basic open sets of the form [0, b), (a, b), and 
(a, 1], for a, b € (0, 1). 


In the case where S = C, the relative topology is particularly interesting, because it has a countable 
collection of basic open sets that arise naturally from inside C. We recall from Sect. 3.7 that the 
elements of C are those numbers in C with ternary expansions that can be written using only the 
digits 0 and 2. 


5.2.4 Consider ternary expansions of the form 


x =0.02a,a2a3... where aj,d7,43,...=0or2. 
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Show that the numbers of this form make up the intersection of C with the interval 
[2/9, 1/3], which is also the intersection of C with the open interval (2/9 — ¢,1/3 + €) 
for € sufficiently small. 

5.2.5 By generalizing the idea of Exercise 5.2.4, show that, for any sequence by bp --- by of digits 
b; = 0 or 2, the set 


F(b,...,b,) ={x: x= b,... byayara3.... for some aj, do, a3,... = 0 or 2} 


is an open set in the relative topology of C. 
5.2.6 Show also that the sets F(b;,..., bg) are basic open sets in the relative topology for C. 


It is a similar story for the set N of irrational numbers in [0,1]. We know from Sects. 2.7 
and 2.8 that N can be identified (via continued fractions) with the infinite sequences 
(a},2,3,...) € NN. 


5.2.7 Show that the set 
G(b,..., be) = {x2 x = (b,..., dg, a1, a2,...) for some ay, a2,... € N} 


is an open set in the relative topology on N. 
5.2.8 Show also that the sets G(b;, ..., by) are basic open sets in the relative topology for NV. 


5.3 Closed Sets 


The complement R” — U of an open set U in R” is called closed. For example, a 
closed interval [a, b] in R is a closed set because its complement (—09, a) U (0, 00) is 
open. The basic properties of closed sets follow from those of open sets, enumerated 
in Sect. 5.1, by taking complements. In particular: 


1. The empty set and the whole space R” are closed. 
2. An arbitrary intersection of closed sets is closed. 
If the closed sets F; in question are the complements of open sets U;, then 
intersection of the F; = complement(union of the U;) 
= complement of an open set 
= closed set. 
3. The union of two closed sets is closed. 


If the closed sets F' and Fz are the complements of U; and U2, respectively, 
then 


F, U Fy = complement(U; 1 U2) 
= complement of an open set 


= closed set. 
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4. The set of closed sets is equinumerous with R. 
Because the complement operation gives a bijection between the set of closed 
sets and the sets of open sets, which we know is equinumerous with R. 


F is traditionally used to denote closed sets because it is the initial letter of the 
French word fermé, meaning closed. On the same grounds, one might expect O to be 
used to denote open sets, because the French word for open is ouvert. However, the 
letter O is likely to be confused with 0, which is probably why one uses the second 
letter, U, instead. 

Note that “closed” does not mean “not open,” because there are many sets that 
are neither open nor closed. A closed set is “closed” in the sense that it includes all 
its limit points. 


Closure of Closed Sets. /f F is a closed set and x is a limit point of F, then x € F. 
Conversely, any set that includes all its limit points is closed. 


Proof. Recall from Sect. 3.6 that x is a limit point of F if every e-neighborhood of 
x includes points of F other than x. 

It follows that x is not in the complement of F’,, because the complement of F is 
open and hence contains an ¢-neighborhood of each of its points. 

Conversely, suppose F is a set that includes all of its limit points. Then each 
point y ¢ F is not a limit point of F,, so y has an open neighborhood disjoint from 
F. In other words, the complement of F contains an open neighborhood of each of 
its points, and hence is open, so F is closed. oO 


Exercises 


5.3.1 Show that Q is neither open nor closed in R. 

5.3.2. Show that, for a < b, the half-open interval [a,b) = {x : a < x < b} is neither open nor 
closed. 

5.3.3 Show that the examples in Exercises 5.3.1 and 5.3.2 are countable unions of closed sets. 


On the other hand, in certain topologies many sets are both open and closed. 


5.3.4 Show that the complement of the basic open set F(0,2) in C equals the open set F(0, 0) U 
F (2), so F(O, 2) is closed. 

5.3.5 More generally, show that the complement of any basic open set in C is a finite union of 
basic open sets. 

5.3.6 Show similarly that the complement of any basic open set in N is a countable union of basic 
open sets. 


5.4 Compact Sets 


In Sect. 3.6 we proved the Heine—Borel theorem about covering [0,1] by open 
intervals. We remarked that the Heine—Borel property (“arbitrary cover contains a 
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finite subcover’’) is the defining property of what we now call compact sets. We now 
replay the proof of the Heine—Borel theorem to prove the following: 


Characterization of compact sets in R. A set K C R is compact if and only if it is 
closed and bounded. 


Proof. First suppose that K is closed and bounded. 

Since K is bounded, we can “bisect” it by bisecting the interval between an upper 
and lower bound of K, and then we can proceed as in the proof of the Heine—Borel 
theorem. That is, we suppose K is covered by an infinite set of intervals U;, with 
no finite subcover, and repeatedly choose a “half” with no finite subcover. In this 
way we obtain a nested sequence of intervals I; 5 I, > I; D --- with the following 
properties. 


1. Each [,,,, is half the length of /,. 

2. Each I, contains points of K. 

3. The set I, K is covered by infinitely many of the intervals U;, but not by finitely 
many of them. 


This implies that (1), J, is a single point x, which belongs to the closed set K 
because it is a limit point of K. It follows that x € U; for some open interval U; in 
the collection covering K. But then J, c U; for sufficiently large k, contradicting the 
assumption that J, M K cannot be covered by finitely many of the intervals Uj. 

This contradiction establishes that a closed bounded set is compact. 

Conversely, suppose that K is compact. 

If K is unbounded then we can cover it by the intervals U,, = (—n,n), but not by 
finitely many of these U,,; hence K must be bounded. If x is a limit point of K but 
x ¢ K then we can cover K by the intervals 


1 1 
v= (-s.x- +] and W,=(r+ 4.00), 
n n 


but not by finitely many of these. Thus, K contains all its limit points, and hence K 
is closed. Oo 


Typical examples of compact sets are the closed intervals [a, b] in R. Indeed, the 
nested interval property of Sect. 2.6 has the following generalization to compact 
sets. 


Nested compact sets. [f K,; D Ky 2 K3 D--- are compact sets, then K,, Kz, K3,... 
have a point in common (and the point is unique if the size of K,, tends to zero). 


Proof. The sets R — K, are complements of closed sets, hence open. Their union 
U,(R—- K,) covers the complement of (),, Ky, so if (1), Kn is empty then ),(R—- K;,) 
covers R. In particular, it covers the interval that contains K,, which we can take to 
be [0,1], without loss of generality. 

Thus, we have a covering of [0,1] by the open sets R— K,,. By compactness, [0,1] 
is also covered by finitely many of them, which we can assume to be R — K,,R - 
Ky,...,R—- Ky». But 
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R-K, CR-K,C-:-CR-K, because K, D Ko D-:-D Kn, 


so K,, € [0, 1] is not covered, which is a contradiction. 
Our assumption that (),, K, is empty is therefore false; there are points in (),, Kn, 
and clearly just one point if the size of K,, tends to zero. oO 


Exercises 


Compact sets are generally “better behaved” than sets that are merely closed, as the nested compact 
sets property shows. 


5.4.1 Give an example to show this is not generally true for closed sets. 
5.4.2. Prove that a continuous f maps a compact K onto a compact K’. 
5.4.3 Give an example to show that this is not generally true for closed sets. 


The following exercises give a proof of the Bolzano—Weierstrass theorem (Sect. 3.6) using the 
compactness of [0,1] instead of the bisection argument. 


5.4.4 Suppose that S c [0, 1] is infinite but has no limit point in [0,1]. Deduce that each x € [0, 1] 
lies in an open interval with no points of S other than (possibly) itself. 
5.4.5 From the covering of [0,1] given by Exercise 5.4.4, derive a contradiction by compactness. 


5.5 Perfect Sets 


The concept of a perfect set goes hand-in-hand with the concept of an isolated point, 
as we see from the following: 


Definition. A point P of a closed set F is isolated if there is an e-neighborhood of 
P containing no other points of F. A nonempty closed set with no isolated points is 
said to be perfect. 


For example, N is a closed set consisting entirely of isolated points, whereas 
[0, 1] is a closed set with no isolated points. Other examples of closed sets without 
isolated points are R and the Cantor set of Sect. 3.7. We have seen that the last three 
have continuum cardinality, and in fact Cantor proved that this is true of all perfect 
sets. 


Cardinality of Perfect Sets. Every perfect set has continuum cardinality. 


Proof. The bijection from R to (0, 1) introduced in Sect. 3.3 clearly sends perfect 
sets to perfect sets (no isolated points in the image), hence it suffices to find the 
cardinality of a perfect set F C (0,1). We do this by imitating the proof from 
Sect. 3.7 that the Cantor set has continuum cardinality. 

We construct an infinite binary tree of sets as shown in Fig. 5.3, where each set 
is perfect and the two sets Foo and Fy; immediately below each set F, in the tree 
are subsets of Fy that we can view as its “lower third” and “upper third.” 
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F 
Fo F, 
/ \ a 
Foo Fo. Fi Fy 
BN FN /\ / \ 


Fig. 5.3. The tree of perfect sets 


These “thirds” are found as follows for F, and it is similar for other sets in the 
tree. First note that / has a minimum member xp and a maximum member x). 
Namely, 


Xo = lub of x for open intervals (—1, x) in the complement of F, 
and the existence of x; is shown similarly. Now consider the set 
F 1 [Xo, Xo + (41 — X0)/3]. 


It is the intersection of closed sets, hence closed, and it has no isolated points except 
possibly the upper endpoint xo + (x1 — xo) /3. If this point is in F and isolated, remove 
it, and in any case call the resulting perfect set Fo the “lower third of F’.” 

We similarly construct a perfect set F), the “upper third of F’ by removing the 
one possible isolated point (if it is isolated) from the lower end of the closed set 


FO (x, — (x1 — x0)/3, x1]. 


We can then repeat the construction, finding perfect sets Fo9 and Fo, that are 
“lower third” and “upper third” of Fo, and perfect sets Fi9 and Fj, that are “lower 
third” and “upper third” of F',; and so on, obtaining the tree of perfect sets shown in 
Fig. 5.3. 

Moreover, the length of the intervals housing the sets tends to zero as one moves 
down any branch of the tree—each interval being at most 1/3 of the one before—so 
there is exactly one point common to all the sets on a branch, by the nested compact 
sets property from the previous section. This point is in F, since all sets in the tree 
are subsets of F’. Finally, the points belonging to different branches are different, 
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since they belong to disjoint intervals. So there are continuum-many points in F, 
because there are continuum-many branches in the infinite binary tree. oO 


It follows from this theorem that any set of real numbers containing a perfect set 
has the cardinality of the continuum. So if every set S of real numbers has the perfect 
set property—if uncountable, S contains a perfect set-—the continuum hypothesis 
will follow. Indeed, Cantor set out to prove the continuum hypothesis by proving 
the perfect set property for larger and larger classes of sets. In Chap. 6 we will 
discuss the first step in his program: the so-called Cantor—Bendixson theorem that 
says any uncountable closed set contains a perfect set. 


5.5.1 Beyond Open and Closed Sets 


There are many sets in R” that are neither open nor closed. For example, the set Q 
of rational numbers is neither open nor closed in R. It is not open because it does 
not contain e-neighborhoods of its members, and it is not closed because it does not 
contain some of its limit points (e.g., V2). Since Q is an important set, it is desirable 
to find a classification of sets that goes beyond the open and closed sets. Since we 
are interested in sets likely to arise in analysis, we might expect the limit operation 
to generate the sets we want in some systematic manner. 

However, since we have already used the complement operation, it turns out to be 
simpler to use a native set operation, countable union, which can generate sets very 
easily in conjunction with the complement operation. For example, Q is a countable 
union of closed sets, namely 


Q= L Jira, 


i=1 


where r1, ’2, 73, ... 1S an enumeration of the rational numbers. Each singleton set {r;} 
is closed because it is the complement of the open set (—09, 7;) U (7;, 09). 

Countable union is also a natural operation in the theory of measure that we 
touched on in Sects. 1.7 and 3.5. The measure of a countable union of disjoint sets 
corresponds to the countable sum of their measures, which are real numbers. It 
follows, as we will show in Chap. 9, that we can measure any set built from an 
open set by complementation and countable unions. But we cannot expect to extend 
the concept of measure much further than this, because only countable sets of real 
numbers can be summed. 

The sets generated from open sets by complementation and countable union are 
called the Borel sets. The complexity of a Borel set may be measured by the “number 
of operations” required to build it, but this “number” may well be infinite. In the next 
chapter we will study the appropriate “numbers” (the ordinal numbers) for counting 
the number of steps in infinite processes such as the generation of Borel sets. 
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Exercises 


The “middle third” set constructed in F not only has the same cardinality as the Cantor set C but 
is also actually homeomorphic to it. 


5.5.1 Use the common tree structure to define a bijection f between C and the “middle third” 
subset of F’. 
5.5.2 Show that the bijection / is continuous in both directions. 


5.6 Open Subsets of the Irrationals 


The foundation for the study of Borel sets is the existence of a universal open 
set U/—a two-dimensional open set with horizontal sections that are all the one- 
dimensional sets. For technical reasons, it is easier to construct such a set using the 
set N of irrational numbers in (0,1), rather than the whole closed interval [0,1]. In 
this section we carry out the construction of U/ in N?. 

We view WN as the set N™ of all functions f : N — N; that is, all infinite sequences 
(a1, 42, a3,...) of positive integers. For each such sequence we have a real number 
given by the infinite continued fraction 


which is between 0 and 1 and irrational because any rational number has a finite 
continued fraction. (See Sects. 2.7 and 2.8 to refresh your memory of continued 
fractions.) Conversely, any irrational number in (0,1) has an infinite continued 
fraction of the above form, and hence corresponds to a sequence (dj, d2,d3,...) 
in N. Thus, we can view W as the set of all irrational numbers in (0, 1). 

We can likewise view the open subsets of NV as its intersections with the open 
sunsets of (0, 1), which we know are unions of rational intervals. However, there is 
a more natural way to generate the open subsets of NV, from open sets corresponding 
to the finite sequences (a1,...,a,). For each such sequence, let 


G(a,....a) ={f EN: f() =a),..., f() = ay}. 


Then G(a,,..., a) corresponds to all the continued fractions of the form 
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+ 
Ant x 


where x is 1/(an arbitrary infinite continued fraction) and hence x is an arbitrary 
irrational number in (0, 1). This means that G(q),...,a,) is the intersection of N 
with the open interval in (0,1) whose endpoints are the rational numbers obtained 
by substituting x = 0 and x = | in the continued fraction above. 

It is clear that any (a),d2,a3,...) € N belongs to all the open sets G(a;), 
G(a,, a2), G(a1, 42, 43),... and the size of G(a 1, ..., a,) tends to zero as k increases, 
by the convergence of continued fractions proved in Sect. 2.8. So, if O is an open 
subset of NV and if (a1, a2, 43,...) € O then G(q,,..., ax) € O for k sufficiently large. 
It follows that any open set O C WN is a union of sets of the form G(q,,..., a,). For 
this reason, the open sets G(a1,..., a) are called basic. 


5.6.1 Encoding Open Subsets of N by Elements of N 


We know from example 8 in Sect. 3.1 that there is an enumeration of all finite 
sequences (a 1,..., ax) of natural numbers. If (a1, ..., ax) is the nth sequence in some 
fixed enumeration, we let G, = G(a,,...,a,). Then any open set O C WN is a union 
of certain G,,, and hence 


O= LJ Grn forsome f:NON. 


n=1 


In this way, each f € N encodes an open subset of NV, and if we imagine the 
elements of NV as irrationals on the y-axis, we can envisage the set encoded by 
y displayed on the horizontal line at height y. Remarkably, the subset of the plane 
defined in this way is an open subset of N? = Nx N. We call it a universal open set. 


Universal open set. There is an open subset U of N? whose sections 
Uy) = {x : (x,y) € U} 


are all the open subsets of N. 


Proof. To define U we interpret each y € N as a function y : N — W via the 
continued fraction for y. Then let 


(xyyE€U @xe€ Gyn for some n. 
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Thus, 
Uy) = {x : x € Gyn) for some n} = Ss Gyn) 
n=l 


is the open subset of NV encoded by y, as described above. As y runs through all the 
elements of NV, the unions L), Gy run through all the open subsets of NV. 
To see why 1 is an open subset of N”, observe that 


U = |) Hn. where Hy, = {(x,y): x € Gym}. 


n=l 


Since Y is the union of the sets H,,, it suffices to prove that each H,, is open. We do 
this by showing 


(x0, Yo) € Hy = (x,y) € A, for all (x, y) sufficiently close to (x0, yo). 


Well, if y is sufficiently close to yo, the continued fractions for y and yo agree to a 
given depth n, so y and yo agree as functions up to a given argument n, which means 
Gyn) = Gyo(n) for y sufficiently close to yo. 

Also, since Gyn) = Gyn) 18 an Open set, 


x9 © Gyiny > X © Gy (ny for x sufficiently close to Xo. 
Putting these two facts together, we get 


(x0, Yo) € A, > X0€ Gyo(n) 
=> x € Gyn) for (x, y) sufficiently close to (x0, yo) 


=> (x,y) € H,, for (x, y) sufficiently close to (xo, yo), 


as required. im 


Exercises 


The above result may be used to construct a universal open set in [0, 1] x [0, 1]. 
5.6.1 Show that the closed set ¥ = N* — Uc N? is universal in the sense that its sections 
Fy) ={x: (x,y) € FY 


are all the closed subsets of NV. 
5.6.2. Now consider the closure F of F in [0, 1] x [0, 1], obtained by adding all the limit points 
of ¥ in [0, 1] x [0, 1]. Show that the sections 


5.7 Historical Remarks 125 


Fig. 5.4 Felix Hausdorff 


F (y) = {x: (x,y) € F} 


are all the closed subsets of [0,1], and hence that F is a universal closed set for [0,1]. 
5.6.3. Deduce from Exercise 5.6.2 that [0, 1] x [0, 1] — ¥ is a universal open set. 


5.7 Historical Remarks 


According to Ferreiréds (1999), p. 139, Dedekind around 1870 developed the ideas 
of open and closed sets in analysis, but did not publish them. Later they were 
rediscovered by Peano (1887) and Jordan (1893). Cantor (1884) took up the study 
of closed sets, as the first stage in his program to prove the continuum hypothesis by 
showing that each uncountable set of reals contains a perfect subset, and hence has 
continuum cardinality. He took no interest in open sets, presumably because their 
cardinality is obvious. 

The idea of characterizing continuous functions f as those such that f-! of 
any open set is open is due to Hausdorff (1914). In the same book, Hausdorff 
also introduced the concept of a topological space, as one with a system of “open 
sets” with the three characteristic properties listed in Sect. 5.2. Actually, Hausdorff 
included a fourth property, stating that any two points lie in disjoint open sets. 
Topological spaces with the fourth property are now called Hausdorff spaces. 
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Hausdorff also investigated the Borel sets, which arise from the open sets by 
the operations of complement and countable union. In Hausdorff (1916) he showed 
that the Borel sets have the perfect set property, thus carrying Cantor’s program 
to a much higher level. The same result was proved independently by Alexandrov 
(1916). Cantor’s program has since been pushed further, but not to all subsets of R. 
Nevertheless, it remains viable in the sense that it is consistent with the usual axioms 
of set theory for each uncountable set of reals to have a perfect subset. Just what the 
“usual axioms” are will be explained in the next chapter. Suffice to say, at this point, 
that there is a “model” of the set theory axioms in which every uncountable set of 
reals has a perfect subset. The model is due to Solovay (1970), and we say more 
about it in Sect. 6.8. 


Chapter 6 
Ordinals 


PREVIEW 


To formalize the idea of “counting past infinity,’ we first need a clear idea of the 
numbers 0, 1, 2,3,... involved in ordinary counting. It appears that the set concept 
is the simplest idea that can serve as a foundation for counting through, and beyond, 
the finite numbers. The sets involved in the infinite counting process are called 
ordinal numbers or simply ordinals, and the natural numbers are represented by 
sets called the finite ordinals. 

The finite ordinals can be defined with almost ridiculous ease as follows, using 
just the concepts of set and membership: 


0 = empty set, 
1 = {0} (the set with member 0), 
2 = {0,1} (the set with members 0 and 1), 


and so on. With this definition, each finite ordinal is the set of all its predecessors, 
and m < nif and only if m € n. Thus, the < relation is simply membership. It is then 
natural to take the first infinite ordinal to be 


w = {0,1,2,3,.., 


since its members are precisely the finite ordinals. 

This is the right idea, but it involves the assumption that infinite sets exist. Further 
assumptions about sets are required to push the idea further, to ordinal numbers 
that are not merely infinite but uncountable. In fact, we end up with a collection of 
assumptions for sets in general, called the Zermelo—Fraenkel (ZF) axioms. 
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6.1 Counting Past Infinity 


Recall from Sect. 5.5 that an isolated point of a closed set F is a point P € F with an 
é-neighborhood containing no other points of F’, and that a closed set F is perfect 
if it contains no isolated points. These concepts suggest the possibility of finding 
a perfect subset (if it exists) of a closed set F by repeatedly removing all isolated 
points. Before we state a theorem to this effect (the Cantor—Bendixson theorem of 
Sect. 6.4), it is well to be aware of what can happen when we “repeatedly” remove 
isolated points from a closed set. 

Cantor (1872) called the result of removing the isolated points from a closed 
set F the derived set F’ of F, and he noticed that there are closed sets on which 
the derived set operation ’ may be repeated many times. In fact, the operation ’ 
may be repeated an infinite number of times, and more. That is, after an infinite 
sequence of applications of the operation’, some members of F can remain, so’ can 
be applied again, perhaps infinitely often. To cope with this unprecedented situation 
Cantor developed set theory, and particularly the theory of ordinal numbers, in order 
to describe situations in which it is natural to count “past the finite numbers” or 
transfinitely. 

Granted that the sequence of operations ’ may be infinitely long, one still feels 
that it must eventually be completed, leaving a closed set F with no isolated 
points. The problem is to describe the number a of times the operation ’ must be 
applied and, hopefully, to show that @ is countable (so as to show that the set of 
points removed from F is countable). 

To give an idea why the ’ operation may be applied many times, we show how to 
build more and more complicated closed sets F € R. We start with 


1 7 1 
F\= oe . 
24 8 16 


F is rather easy to visualize, but we also include a picture (Fig. 6.1) in order to 
introduce a graphic device that will be useful for more complicated sets. Each point 
of F, lies in the middle of a vertical line—which is easier to see than the point 
itself—and we make the lines shorter as the points get closer together. 

The operation ’ can be applied exactly twice to F\, because each of its points 
except | is isolated. But if we make each isolated point of F; the limit point of a 
new sequence of isolated points, as in the set Fz shown in Fig. 6.2, then Fy = F\,s0 
the ’ operation can be applied to F2 exactly three times. 


Fig. 6.1 The set F; with derived set {1} 
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Fig. 6.2 The set F2 with derived set F, 


Similarly, we can make each isolated point of F’, the limit point of a sequence of 
isolated points of a set F3, so that the operation ’ applies exactly four times to F3, 
and so on. For each natural number n we can construct a closed set F, to which the 
operation ’ applies exactly n + | times. 

This is only the beginning. We now construct a set F,, to which the operation ’ 
applies infinitely often, by arranging that, in F.,, 


1 1 
5 is the limit of a set like F', lying between 0 and > 


3 1 
Z is the limit of a set like F2 lying between 5 and mt 


7 3 
3 is the limit of a set like F'3 lying between a and 3° 


This can obviously be done by suitably scaling and translating the sets 
Fy, Fo, F3,.... Then 


1 

One application of ’ removes all points in F,, that are < 5 
fae j ee hy 3 

Two applications of ’ remove all points in F,, that are < Z 


7 
Three applications of ’ remove all points in F,, that are < 8 


so we can say that the operation ’ applies to F., as many times as there are natural 
numbers. This number of applications is denoted by w (and this is why we have 
already used w for the subscript of the corresponding closed set). And, in fact to 
remove the limit point 1 of F,, we need to apply the operation ’ once more after 
as many steps as there are natural numbers. Naturally, we denote this number of 
applications by w + 1. 

The numbers w and w + | are the first members of what Cantor called the second 
number class. Today, we call these numbers (infinite) countable ordinals. (The first 
number class consists of the finite ordinals 0, 1,2,....) 
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Exercises 


Several operations we have already seen in this book invite “transfinite continuation” like that we 
have just seen for the ’ operation. 


6.1.1 Diagonalization of real numbers. Given real numbers x), x2, x3,..., we can use the diagonal 
argument to get a new number, which we might call x,,. Then we can also diagonalize the 
sequence x,,, x1, X2,X3,... to get x441. See how far you can continue. 

6.1.2. Diagonalization of integer functions. Suppose we have a sequence of increasing functions 
fi : N - N, each of which grows faster than the one before. That is fi+1(n)/fi(n) — © as 
n — ov, Define a function f,, that grows faster than each f;. 

6.1.3. Classes of functions obtained as limits. Let Bo = {continuous real functions} and B,., = 
{limits of functions in B,,}. Supposing that each B,,,; has members not in B,,, how might B,, 
be defined? 


6.2 What Are Ordinals? 


In the previous section and its exercises we have seen several kinds of operation that 
can be applied infinitely often. The sequence of applications may be not merely 
infinite, but “longer than the sequence of natural numbers.” We can introduce 
symbols w,w + 1,w+2,... for the numbers (“ordinals”) that measure the length of 
these infinite sequences, but the meaning of the symbols will remain hazy until we 
have a precise definition of what ordinals are. Cantor used set theory in an informal 
way to study ordinals, but von Neumann (1923) was the first to provide a definition 
of ordinal numbers as a particular kind of set. Von Neumann’s definition is so elegant 
that it throws new light on the natural numbers—the finite ordinals—as well. 


6.2.1 Finite Ordinals 


The number 0 is defined to be the empty set, denoted by { } or 0. Then 1, 2,3,... are 
defined successively as follows: 


O={}, 


1 = {0}, 
2 = {0, 1}, 
3 = {0, 1, 2}, 
In a nutshell, n + 1 is the set with members 0,1,2,...,, so each nonzero 


natural number is the set of its predecessors. This magical definition, which 
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seems to make numbers out of nothing, also connects the native concepts of set 
theory—membership and union—with the native concepts of number theory—order 
and successor. Indeed, we have: 


¢ For any natural numbers m andn,m<n@men. 
¢ The successor function S(n) = n U {n}. 


As we saw in Sect. 2.2, Grassmann discovered that number theory can be based 
on the successor function, by using successor to define addition and multiplication. 
By combining Grassmann’s conception of number theory with von Neumann’s 
concept of natural number, we see that number theory is part of set theory. In fact, 
Ackermann (1937) showed that number theory is essentially identical to finite set 
theory. We will return to this surprising view of number theory in Sect. 6.6, when 
we have developed a clearer picture of what set theory actually is. 

Our immediate task is to explain the concept of infinite ordinals, which will force 
us to examine the set concept more closely. 


6.2.2 Infinite Ordinals: Successor and Least Upper Bound 


To define infinite ordinals, we develop one of the key ideas of the previous section: 
an ordinal is the set of all its predecessors. Given that the infinite set 


w = {0,1,2,3,...} 


of all natural numbers exists, we can say that w is the least infinite ordinal because 
its predecessors (i.e., members) are all the finite ordinals. We can also say that w 
is the least upper bound of the finite ordinals, because it is greater than them all, 
but no smaller ordinal is. Thus, the step from finite to infinite ordinals demands 
the existence of the infinite set w. This set is in fact the foundation of all infinite 
set theory, but for now we will be content to show how w is the foundation of the 
infinite ordinal numbers. 

The successor operation S (x) = x U {x} applies to any set. So, starting with w, we 
can generate an infinite sequence of infinite ordinals: 


w+1= {0,1,2,3,..., a}, 
w+2= {0,1,2,...,w,w+ 1}, 


w+3={0,1,2,...,W~,7+1,w + 2}, 


Then, by embracing! all the sets created so far in a single set, we obtain their least 
upper bound 


‘Happily, set theory literally uses the braces { and } to comprehend a collection of objects as a set. 
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w:-2={0,1,2,...,~,0+1,w+2,...}. 
Another sequence of successors then leads to 
w-3= {0,1,2,...,~,0+1,w04+2,...,W-2,w-24+1,w-24+2,...}, 


and we similarly obtain w-4,w-5,.... 

The sequence of ordinals w, w - 2,w-3,... also has a least upper bound, which 
we obtain by collecting all of these sets, and all their predecessors, into a single set 
w*. Since predecessors are members, this least upper bound is simply the union of 
the sets w,w-2,w-3,.... That is 


or SO Or2 0 G23 se; 


As we ascend to larger ordinals, it becomes more and more convenient to take unions 
of infinitely many sets to obtain least upper bounds. Indeed, the union of any set of 
ordinals is the least upper bound of the set. 

In this way we can grasp ordinals w*, w*, w,... and their least upper bound w®; 
then ordinals w”, Ww, w, ... and their least upper bound w®”’; and so on, to ever 
more dizzying heights. 

Yet, mind-boggling as these ordinals may be, they are all countable. That is, they 
are sets with countably many members, because every least upper bound operation 
applied so far involves a countable union of countable sets. 


6.2.3. Uncountable Ordinals 


Most of the ordinals encountered in this book are countable, but it should be no 
surprise that uncountable ordinals exist. In fact, the Jeast uncountable ordinal, w; 
should be the set of all countable ordinals. However, to make this definition of w, 
precise we need a precise definition of ordinal and <. Here it is. 


Definition. An ordinal a is a set that is 


* €-transitive: that is, if 6 € a andy € £, theny € a, 
e ¢-linear: that is, if 6, y € a, then either B € yory € 8. 


Also 8 < aif and only if 6B € a. 


It is not hard to check that this definition is satisfied by all the ordinals mentioned 
so far. However, the nature of the sets that satisfy the definition depends on the nature 
of the membership relation €, and hence ultimately on the axioms of set theory. We 
discuss these axioms more fully later, but one should be mentioned here because it 
is motivated by the properties of ordinals. This is the axiom of foundation, which 
says that there is no infinite descending membership sequence --- € a3 € a2 € Qy. 
Among other things, the axiom of foundation ensures that each set of ordinals has a 
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least member, and hence it enables definitions and proofs by induction on ordinals. 
This extended form of induction is called transfinite induction. 


Exercises 


Use the definition of ordinal to verify the following. 


6.2.1 If 8 is an ordinal and a € 8, then a is an ordinal. 

6.2.2 If a and f are ordinals and a C £, then a < B. 

6.2.3. If a is an ordinal, then so isa@+ 1 =aU {a}. 

6.2.4 If a) < a <--- are ordinals, so is their lub, L); aj. 

6.2.5 Also verify that ); a is indeed the lub of the a;, because any ordinal less than ; @ is less 
than some q;. 


6.3. Well-Ordering and Transfinite Induction 


The usual way to state the axiom of foundation, which obviously implies the 
nonexistence of infinite descending membership sequences, is the following: 


Axiom of Foundation. Each nonempty set S has an €-least member; that is, an 
element x € S such that y ¢ x foranyyeS. 


It follows, in particular, that x ¢ x for any set x, and the definition of ordinal 
implies that any ordinal o is linearly ordered” by the membership relation €. That 
is, if we write the usual order symbol < in place of €, then any a, 6, y in o satisfy: 


l. aka (Irreflexivity) 
2. Ifa ## then either a < £ or B < a (but not both). (Linearity) 
3. Ifa <fBandB<ythena<y. (Transitivity) 


The axiom of foundation gives a fourth property that makes the linear ordering a 
well-ordering: 


4. Any subset of o has least member. (Well-foundedness) 


It follows that o has a least member, which can only be the empty set 0. Because 
if the least a € o were not empty, any 8 € a would be a lesser member of o, 
contradicting the definition of a. It similarly follows that the least member of o — {0} 
is {O} = 1, and so on. Indeed, it is easy to see that o is the set of all ordinals less 
than o. 


2We wrote down the defining properties of a linear ordering once before, in Sect. 2.5. There we 
stated them as properties of <, because they were motivated by the C relation between lower 
Dedekind cuts. Here we are motivated by the € relation, so we write them as properties of <. 
However, it is easy to see that the two sets of properties are equivalent. 
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Ordinals are not the only examples of well-ordered sets, but they include the 
order types of all well-ordered sets, in the following sense. 


Ordinal Representation of Well-orderings. /f < is a well-ordering relation on a 
set W, then (W, <) is order-isomorphic to some ordinal o under the € relation. That 
is, there is a bijection f : «7 — W such that 


we Be f(a) < f). 
Proof. We would like to define f “inductively” by saying 


f(O) = least member of W, 
F(a) = least member of W — {f(B) : B < a}, 


until we reach an ordinal o such that {f(6) : B < o} = W. However, we have not 
yet proved that such a transfinite induction is valid, so we take the following more 
cautious approach. 

Consider all the bijections fy, (between ordinals a and subsets of W) satisfying 
the following conditions: 


1. fo(B) is defined for all B < a. 
2. fa(0) = least member of W. 
3. For any y < a, fo(y) = least member of W — {f0(B) : B < y}. 


The set of such functions is not empty, since it includes the function /, consisting 
of the single ordered pair (0, least member of W). 

Also, any two such functions, say fy and gs, are compatible in the sense that 
Say) = 9s(y) on any y on which they are both defined. Because if f.(y) # gs(y) then 
there is a least y for which this happens, and one sees that this least y contradicts 
conditions 2 and 3 above. Compatibility implies that, for each a, there is at most 
one function fy satisfying conditions 1, 2, and 3. 

Now let o be the least ordinal greater than all the a@ for which fy exists. By 
compatibility, the union f of {fy : @ < a} is an injection f : 7 — W. If f is not onto 
W then W — {f(@) : @ < o} is not empty, and we can define 


f(~) = least member of W — {f(@) : a < co}, 


which contradicts the definition of o. 

Thus, f is a bijection 0 — W, and it easily follows from conditions 2 and 3 that 
f is order-preserving. (If not, take the least a and £ such that a € B but f(a) > f(B), 
and derive a contradiction.) Oo 


This theorem shows why any well-ordered set is isomorphic to an ordinal. The 
proof also shows how one may justify defining a function by transfinite induction— 
that is, by taking the union of all functions that satisfy the induction up to a certain 
ordinal. From now on we will use inductive definitions without detailed justification. 
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Another important corollary of this proof is the following. 
Corollary 3. For any two ordinals u and v, either  € v ory € 


Proof. As above, take the union of all bijections fo from some a < yu to a < v. The 
set is not empty because it includes the function f, consisting of the single ordered 
pair (0, 0). If we now take o to be the least ordinal greater than the a for which fy 
exists, then we find o = y or o = v. It follows that w € vorv € yp. oO 


Exercises 


In Sect. 6.1 we gave an example of a well-ordered set of rationals with the order type w; namely, 
137 

(3, rere 

type w”. 


I. We also indicated, by a picture, how to construct a set of numbers with order 


6.3.1 Give an explicit set of rational numbers with order type w. 

6.3.2 Given a well-ordered set of rationals with order type a, explain how to obtain a set of 
rational with order type a + 1. 

6.3.3, Given well-ordered sets of rationals with order types a1, a2, a3, ..., explain how to construct 
a set of rationals with order type at least ); a;. 

6.3.4 Deduce, from Exercises 6.3.2 and 6.3.3 that there are sets of rationals with the order types 
of all countable ordinals. 


The inductive definitions of sum and product from Sect. 2.2 are easily extended to all ordinals 
by transfinite induction. The “induction step” must now be supplemented by a step for ordinals 
that are not successors, the so-called /imit ordinals. Here is the definition of a + 8 by induction 


on p: 
at+0=a 


a+6+l=(a+f)+1 


aty= i (a+) fora limit ordinal y. 
<y 


6.3.5 Using the definition of sum, prove by induction on y that the associative law holds for 
ordinal addition: a + (8+ y) = (a+ 8) +y. 

6.3.6 Give an example to show that the commutative law does not hold for ordinal addition. 

6.3.7 Give an inductive definition of ordinal multiplication, and show that it satisfies the 
associative law. 


6.4 The Cantor—Bendixson Theorem 


The first definition by transfinite induction that we will use is that of the sequence 
of derived sets of a closed set, which we began to discuss in Sect. 6.1. As in the 
exercises above, we use the term limit for an ordinal that is not a successor. 


Definition. If F is a closed set, then F is defined for all countable ordinals a as 
follows: 
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F = F — {all isolated points of F}. 
FO*) = F© _ {all isolated points of F}. 


FY = () F® when 2 is a limit ordinal. 


a<a 


In a construction like this one, where points are removed at successor stages a+ 1, 
taking the intersection is the natural thing to do at a limit stage A, because (),<, F 
omits all the points removed from F at stages a + 1 < A. 

The above definition makes sense even for uncountable A, but we are about to 
show that the sequence F‘” becomes constant at some countable a, so it is pointless 
to go to uncountable stages. On the other hand, it should be clear from Sect. 6.1 
and its exercises that F© can continue to change up to an arbitrarily high countable 
ordinal a. Thus, the theorem on the eventual constancy of F‘°’—the famous Cantor- 
Bendixson theorem—is a subtle one depending on the general concept of countable 
ordinal. 

The key to the proof of the Cantor—Bendixson theorem is the following theorem, 
which limits the length of a well-ordered nested sequence of open sets. 


Length of a well-ordered nested sequence of open sets. If we have open sets Uy 
for a < some ordinal y, and if a < B < y > Ug & Us, then y is countable. 


Proof. Since Ug © Ug+i, for each a < y we have a point X%, € Uo+1 — Ue and 
hence a rational open interval J, such that xg € Iq C Ug+i. Indeed, we can define I, 
explicitly as the first interval J (in some fixed enumeration of the rational intervals) 
such that J C Ug4; but J ¢ Ug. 

The intervals J,, Jz are necessarily different for a < f, since I, C Ug but Ig ¢ Ug, 
hence there are only countable many ordinals < y, because there are only countably 
many rational intervals. oO 


Cantor-Bendixson Theorem. /f F is a closed subset of R and F™ denotes the ath 
derived set of F, then FO = FeV for some countable a, and hence F is either 
empty or perfect. 
Proof. It follows from the definition of the sets F that they are all closed and 
that Fy 2 Fg for a < B. Hence the open complements U, of the F are such that 
Uy © Uz fora < B. 

Then, since a nested well-ordered sequence of open sets has countable length, it 
follows that Uy = Uq+1 for some countable ordinal a, and hence F = F@+), 


Exercises 


The following exercises explain why we cannot simply find the interval J, inside Ugi; — Ug in the 
proof of the theorem on the length of a sequence of nested open sets. 
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6.4.1 Give an example of open sets U, V with U c V but with no open interval J c V— U. 
6.4.2 For your example of open sets U, V above, find an open interval J such that / c V but / ¢ U. 


There is an easier theorem about sequences of disjoint open sets. 


6.4.3 Show that any collection of disjoint open intervals in R is countable. 
6.4.4 Use Exercise 6.3.4 to construct, for any countable ordinal y, disjoint half-open intervals 
[de,4e+1) for all a < y with the properties 


da < ag ea <p and (Jiao. do+1) = [0, 1). 


a<y 


Since each [dq, dq+1) is homeomorphic to [0,1), it follows that Uo<)[4a,4a+1) is homeomorphic to 
a structure we may call the y-line [0, 1) x y. The y-line contains a copy [0, 1) x {a} of [0,1) for each 
a < y, with copy a to the left of copy B when a < f, and the point 0 in copy f is the least upper 
bound of all points in the copies a for a < f. 


6.4.5 Deduce from Exercise 6.4.4 that the y-line is homeomorphic (and order isomorphic) to [0,1) 
for each countable ordinal y. 

6.4.6 Similarly define the w,-line, and explain why the w -line is not homeomorphic (or order 
isomorphic) to [0,1). 


The w-line is also known as the Jong line. 


6.5 The ZF Axioms for Set Theory 


It should now be clear that the set concept is practically indispensable for the study 
of analysis in general and the real numbers in particular. Moreover, we have seen that 
even the most basic mathematical objects—the natural numbers—can be naturally 
defined as certain sets. Certain axioms for sets also appear to be indispensable. 
For example, we need to assume the existence of the empty set and an infinite set 
(most conveniently, the set of natural numbers). In this section we list the most 
commonly used axioms for set theory, with comments on their role as a foundation 
for mathematics. They are called the Zermelo—Fraenkel axioms, after Ernst Zermelo 
who proposed most of them in Zermelo (1908), and Abraham Fraenkel who made an 
important amendment in Fraenkel (1922). For short, they are called the ZF axioms. 

The underlying idea of the ZF axioms is that “everything is a set’; in particular, 
natural numbers, real numbers, and functions are certain kinds of sets. In line with 
this conceptual economy, all relations between sets are based on membership € and 
equality =. Thus, the language of ZF set theory is very simple: it has variables 
x, Y,Z,... to denote sets, the relation symbols € and =, and symbols for the basic 
concepts of logic—“and,” “or,” “not,” “for all,” and “there exists.” We are not going 
to use formal logic symbols in this book, but it is necessary to know that they exist, 
and hence that there is a mathematical precise concept of “formula in the language 
of ZF set theory.” This is because the ZF axioms include an infinite list of formulas, 
called the replacement schema. 

Most ZF axioms describe operations for producing new sets from old, by clearly 
defined processes. Implicitly, they describe the cumulative concept of set, according 
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to which each set is constructed from previously defined sets (starting with the 
empty set). Thus, sets arise in stages, which turn out to be ordinal number stages. 
At no stage does one have a “set of all sets,’ because there is always a next stage, at 
which new sets arise. In this way we avoid paradoxes that arose in the early history 
of set theory. 


Extensionality. Two sets are equal if and only if they have the same members. 
It follows, for example, that 


{0O, 1} = {1,0} = {1, 1, 0}, 


because each of these sets has the same members, namely, 0 and 1. 
Empty Set. There is a set with no members. 
It follows from Extensionality that the empty set (which we call 0 from now 
on) is unique. In the cumulative hierarchy of sets, 0 is at the bottom level. 
Pairing. For any sets X and Y there is a set {X, Y} whose only members are X and Y. 
From the empty set 0 that we have from the previous axiom, we can now 
construct the set {0,0}, which equals {0} by Extensionality. Thus, | = {0} occurs 
at the next level of the cumulative hierarchy. 
By further use of pairing we can construct 2 = {0, 1}, but how do we construct 
3 = {0, 1,2}? See the next axiom. 
The pairing axiom gives us the unordered pair {X, Y}, but there is a trick (due 
to Kuratowski 1921) which also gives the ordered pair (X, Y). Namely, let 


(X,Y) = ({X},{X, Y}}. 
This definition, clumsy though it may be, has the essential property that 
(X1, Y1) = (Xo, Yo) & X, = Xp and Y, = Yo. 

Union. For any set X there is a set whose members are the members of members of 
. In the case where X = {Y, Z}, the members of the members of X form what we 
call the union of Y and Z, Y U Z. This special case of union suffices to form 

3 = {0, 1,2} = {0} U {1, 2} 
from sets previously defined by pairing, and more generally we get 
n+l=nU{n} 
An important application of union for infinite sets X is where X is a set of 


ordinals. In this case the set of members of the members a; of X, J; @;, is the 
least upper bound of the ordinals a;. 
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Infinity. There is an infinite set; in fact a set that includes 0 and, along with any 
member X, also the member X U {X}. 

Thus, the members of this set include all the finite ordinals. However, we do 
not yet have the set w whose members are exactly the finite ordinals. To obtain it, 
we would like an axiom that guarantees the existence of “definable subsets,” 
because we can write down a definition of finite ordinal. Zermelo proposed 
such an axiom, and Fraenkel proposed something stronger, involving definable 
functions. 

Replacement. For any function definition f, the values f(x), where x is a member 
of a set X (the domain of f), form a set f(X) (the range of f). 

Replacement is actually an infinite schema of axioms, one for each two- 
variable formula g(x, y) written in the language of ZF. Such a formula defines 
a function f if, for each x € X, v(x, y) holds for exactly one y [called the function 
value f(x)]. 

The “definable subset” axiom used by Zermelo is the special case of 
Replacement where f maps the set X onto a subset of itself. For example, if 
we want to obtain the subset w = {0,1,2,...} from a set Y whose members y 
include 0, 1,2,... we define f on Y by letting f(y) = 0 if y is not a finite ordinal, 
and let f(y) = y otherwise. 

Power Set. For any set X there is a set P(X) whose members are the subsets of X. 

P(X) is called the power set of X, and we have already seen one way in which 
Power Set is a “powerful” axiom, in Sect. 3.8. By the diagonal argument, P(X) 
is a set of higher cardinality than X. In particular, P(w) is an uncountable set. 

The power set axiom is also needed to prove the existence of the least 
uncountable ordinal w,. In fact, any proof that uncountable sets exist needs the 
power set axiom, because the other ZF axioms can be satisfied by a collection 
of countable sets. (Similarly, Infinity is needed to prove the existence of infinite 
sets, because the other ZF axioms can be satisfied by a collection of finite sets; 
see the exercises). 

Foundation. Every set X has a €-minimal member; that is, an x € X such that 
y€xfornoye xX. 

As we have already seen, Foundation guarantees that any set that is linearly 
ordered by the membership relation € is in fact well-ordered by €, which 
simplifies the definition of ordinal. 

Foundation also guarantees the cumulative set concept, by ensuring that each 
set X has a rank a—an ordinal number that counts the number of applications of 
the power set axiom needed to build X, starting from the empty set. We elaborate 
on the concept of rank in the exercises and the next two sections. 


Exercises 


A collection of sets called the hereditarily finite sets is obtained by the following inductive 
construction, already mentioned in the exercises to Sect. 3.8. 
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Vo =0, the empty set, 
Viel = Vi, U PIV,,). 


The union V,, = U,, V; is the set of hereditarily finite sets. 


6.5.1 Prove by induction on n that V,,4; = P(V,). 

6.5.2, Prove by induction on n that each member of V,, is finite, and hence that members of 
members, members of members of members, and so on, are all finite. 

6.5.3. Explain why V,, satisfies all ZF axioms except Infinity. 


The following exercises explore the idea of a “set of all sets” and the contradictions to which it 
leads. 


6.5.4 If X is the set of all sets, why is P(X) contradictory? 
6.5.5 In particular, what about Y = {Z € X : Z ¢ Z}? 


6.6 Finite Set Theory and Arithmetic 


As we saw in Sect. 6.2, the numbers 0, 1, 2, 3,...can be taken to be the sets 
O={}, 1=({0}, 2={0,1}, ..., 


with the successor function S(n) = n U {n}. Thus, ZF can prove the existence of 
the basic objects of arithmetic. In fact, if we omit Infinity from the ZF axioms, the 
remaining axiom set ZF—Infinity is equivalent to the Grassmann—Dedekind—Peano 
axioms mentioned in Sect. 2.2. This is because we do not need Infinity to obtain the 
individual sets 0, 1,2,3,...and the successor function, and the axiom of foundation 
gives us definition and proof by induction. 

In a little more detail, here is how the foundations of arithmetic can be established 
in ZF—Infinity. 


1. The natural numbers 0, 1,2,3,... are the finite ordinals. We gave the definition 
of “a is an ordinal” in Sect. 6.2. An ordinal a is finite if a and all of its members 
each have a greatest member, where y is the greatest member of 8 if y € 6 and 
y € 6 foranyd€ £. 

2. Induction amounts to the principle that, if some natural number n has property P, 
then there is a least natural number with property P (for properties P definable in 
the language of ZF). Since n is a finite ordinal, the numbers < n are the members 
of S(n) = n U {n}, so the least number with property P is the least member of 
the set 


{m:m €nU {n} and m has property P}. 


This least member exists by the foundation axiom. 

3. Since induction is available, we can define sum and product by induction, as in 
Sect. 2.2. All other functions and theorems of arithmetic are then obtainable by 
induction. 
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Of course, the subject matter of ZF—Infinity is more than just the finite ordinals 
0, 1,2,3,..—but not much more, as it happens. The only sets that ZF—Infinity can 
prove to exist are those obtained from the empty set 0 by iterating the power set 
operation a finite number of times. (These are the members of the sets V,, discussed 
in the exercises to the previous section, where it was shown that they satisfy all the 
axioms of ZF—Infinity. Consequently, the sentence “every set belongs to some V,,” 
is consistent with ZF—Infinity.) 

These finite sets can be encoded by natural numbers, and set operations such 
as pairing and union can then be interpreted as operations on numbers. This 
“arithmetization” of finite set theory is based on the ideas of Gédel (1931), who 
arithmetized formal logic to prove his famous theorem on the incompleteness of 
formal systems for arithmetic. The details of arithmetization are tedious and do not 
concern us, but it is useful to know that arithmetic is strong enough to interpret other 
systems for operating on finite objects. Its ability to interpret finite set theory is the 
reason we say that arithmetic is equivalent to ZF—Infinity. 


Exercises 


A typical example of arithmetization is the encoding of an ordered pair (m,n) of natural numbers 
by a single natural number. 


6.6.1 Give ways to encode (m,n) by a natural number; (i) using only addition and multiplication, 
and (ii) using exponentiation. 

6.6.2. Also give an inductive definition of exponentiation, assuming definitions of addition and 
multiplication. 


Ordered pairs are useful for extending the arithmetic of natural numbers to integers and rational 
numbers. 


6.6.3 Suppose we want the pair (a, b) to behave like a — b. Under what conditions do (a, b) and 
(a’, b’) represent the same integer? 

6.6.4 Also define addition and multiplication of pairs so as to reflect addition and multiplication 
of integers. 

6.6.5 Suppose we want the pair (a, b) to behave like a/b. Under what conditions do (a, b) and 
(a’, b’) represent the same rational number? 

6.6.6 Also define addition and multiplication of pairs so as to reflect addition and multiplication 
of rationals. 


6.7 The Rank Hierarchy 


The claim that ZF captures the idea of building the universe of sets in ordinal- 
numbered stages is formalized by a hierarchy of sets V,, and the associated concept 
of rank, which are defined by an induction of all the ordinal numbers a. 
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Definition. Sets V, are defined as follows: 


¢ Vo = 0 (the empty set), 
© Vor =P(Vo), 
* Va = User Ve for each limit ordinal 2. 


The rank of a set X is defined inductively as the least a such that each member of X 
has rank less than a. 


Thus, V, may be viewed as the set of all sets built using < @ applications of the 
power set operation P, and the claim that every set is built at some stage is: 


Existence of Rank. Each set X has a rank. 


Proof. Suppose X is a set that does not have a rank. Then some member X, of X 
also does not have a rank. Because if each x € X has a rank, rank(x), the replacement 
axiom tells us that the range of the rank function on X is a set of ordinals, with union 
a say. This means that X has a rank (< @ + 1), contrary to assumption. 

Thus, there are members X, with no rank, and similarly members X of these X; 
with no rank, and so on. With the help of the replacement and union axioms we can 
collect these 


members X, of X with no rank, 
members X> of members of X with no rank, 


into a single set NV. But then N is a set with no €-minimal member, contrary to the 
foundation axiom. Oo 


The universe of all sets can therefore be viewed as the union of the sets V,, as a@ 
ranges over all the ordinals. It is natural to use the symbol V to denote the universe 
of all sets, though one should always remember that V is not a set. [If it were, P(V) 
would have cardinality greater than the universe, which is absurd.] 


6.7.1 Cardinality 


In Sect. 3.3 we defined sets to be of the same cardinality if there is a bijection 
between them. This suggests that such sets share a common property, called 
cardinality or cardinal number, which we have not yet defined. Up until now, the 
problem in defining “cardinality” is that the collection of all sets with the same 
cardinality is not a set (for much the same reason that the union V of all V, is not 
a set). But with the help of the concept of rank we can get around this problem as 
follows. 


Definition. For any set X, let a be the minimal rank of a set with the same 
cardinality as X. Then the cardinality of X is the set 


{Y € V, : Y has the same cardinality as X}. 
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It still does not seem right to call this set a cardinal number, because it is not 
clear that cardinalities can be ordered. It would be simpler if each set X had the 
same cardinality as an ordinal, in which case we could take the least such ordinal 
as the cardinal number of X. This can in fact be achieved with an extra axiom, the 
axiom of choice, which is commonly added to ZF for this reason. There are in fact 
many advantages to the axiom of choice, and some disadvantages, which we discuss 
in the next chapter. 


Exercises 


6.7.1 Show that the rank of an ordinal a is a. 

6.7.2 Show that the collection of all ordinals is also not a set. 

6.7.3. By making suitable definitions of rational numbers and real numbers, find the ranks of Q 
and R. 


Also locate the following sets in the rank hierarchy. 


6.7.4 The set N x N of all ordered pairs from N. 
6.7.5 The set of all functions: N > N. 


6.8 Large Sets 


In Sect. 3.8 we claimed that there are “largeness” properties so extreme that sets with 
those properties cannot be proved to exist. We suggested that one such “largeness” 
property is inaccessibility, where an inaccessible set is one that has infinite members 
and is closed under the operations of power set and taking ranges of functions. It 
should now be apparent that if Vy is an inaccessible set, then Vy satisfies the ZF 
axioms. 

Certainly, if V, is large enough to have an infinite member, then it satisfies 
the empty set and infinity axioms. It satisfies power set and replacement by the 
hypothesis of closure under power set and taking ranges of functions. Closure under 
power set also guarantees that a@ is a limit ordinal, in which case Vy is also closed 
under pairing and union, so Vj satisfies the pairing and union axioms. Finally, any 
V, satisfies foundation, so Vg satisfies all the ZF axioms. 

It follows that V, also satisfies any logical consequence of the ZF axioms; that 
is, any proposition provable in ZF set theory. But now suppose we take the least a 
such that V, is inaccessible. It follows that any Vg in V, is not inaccessible, so Vy 
satisfies the sentence “there is no inaccessible Vg.” Existence of an inaccessible set 
is therefore not provable in ZF. 

This explains the surprising claim made at the end of Sect. 3.8: if inaccessible 
sets exist, then their existence is not provable in ZF. 

It is actually in the nature of strong axiom systems like ZF that there are 
many sentences they can state, but neither prove nor disprove. Such sentences 
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are said to be independent of the system in question. The famous incompleteness 
theorem of Gédel (1931), mentioned in Sect. 6.6, gives a general explanation for 
this phenomenon of independent sentences. ZF is particularly remarkable because 
its independent sentences include very natural ones for which independence can 
be established without appealing to the Gédel incompleteness machinery. They 
include the existence of “large” sets, such as inaccessibles—as was first noticed by 
Kuratowski (1924)—but also the axiom of choice (AC) and the continuum hypoth- 
esis (CH). The independence of AC and CH was established by a combination of 
the works of Gédel (1939) and Cohen (1963). 

It should also be mentioned that for any “sufficiently strong” axiom system 
there is a sentence Con(), expressing the consistency of 2’, which is independent of 
2 if X is consistent. This result is known as Gédel’s second incompleteness theorem. 
It means that when we use the axioms of a strong system, such as ZF, we not only 
assume the axioms but also their consistency. This is natural enough, I suppose. But 
it means that when we claim that a sentence of ZF is independent we really should 
add “‘assuming that ZF is consistent.” Because of this, statements about consistency 
of strong systems take a relative form: “if X is consistent then so is 2”.” 

For example, the results of Gddel (1939) have the form: 


If ZF is consistent, then so is ZF+AC+CH. 


This is enough to guarantee that there is no harm in using AC or CH on top of ZF. 
No contradiction will result, unless there is already a contradiction in ZF. 

Another relative consistency result, due to Solovay (1970), shows that inacces- 
sibles affect the fundamental problem of measuring sets of real numbers, raised in 
Sect. 1.7: 


If ZF+AC+“an inaccessible set exists” is consistent, 
then so is ZF+“‘all sets of real numbers are measurable.” 


Surprisingly, it is not possible to prove the consistency of the latter theory from 
Con(ZF) alone; one really needs the extra strength derived from the assumption of 
an inaccessible set. Under this assumption, Solovay constructs a model of ZF+“all 
sets of real numbers are measurable’; that is, a collection of sets satisfying all the 
ZF axioms and in which all sets of reals are measurable. Another remarkable feature 
of Solovay’s model (already mentioned in Sect. 5.7) is that it has the perfect set 
property for all sets of reals. The concept of measurability in Solovay’s model is a 
very broad one—known as the Lebesgue measure—which we will study in Chap. 9. 


6.9 Historical Remarks 


The ordinal numbers were introduced by Cantor (1883) as a natural extension of 
the positive integers 1,2,3,.... At first Cantor was interested in the ordinals with 
countably many predecessors, such as 
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w,wt+1,...,W-2,...,W%,..., 


which he needed to count the iterations of his derived set operation. To create these 
numbers he appealed to two “generating principles”: 


1. Forming the successor of any ordinal. 
2. Forming the “limit,” or least upper bound, of any set of ordinals with no greatest 
member. 


Cantor was unclear about how a set of ordinals might be specified. But, applied 
to the “set of ordinals with countably many predecessors,” his second generating 
principle produces a spectacular result: the least uncountable ordinal. 

In this way, Cantor discovered a new path to uncountable sets. Indeed, he 
believed that his second generating principle would produce ordinals of higher and 
higher cardinality—giving a “scale” by which it might be possible to measure the 
cardinality of other uncountable sets, such as R. 

This was how Cantor arrived at his second, and stronger, form of the continuum 
hypothesis: R has the same cardinality as the first uncountable ordinal. He was at 
first optimistic that his theory of ordinal numbers would enable him to prove the 
continuum hypothesis, but the problem was harder than he expected, and there was 
a hiatus in his set theory research until the 1890s. 

In the meantime, Dedekind (1888) published his theory of natural numbers in 
a small book Was sind und was sollen die Zahlen? (What are numbers and what 
are they for?). As mentioned in Sect. 2.2, his book was in part a rediscovery 
of Grassmann’s inductive foundations for arithmetic, but Dedekind went further 
by establishing a set-theoretic foundation for induction itself. In particular, in his 
Theorem 126, Dedekind proved the first theorem asserting the existence of functions 
defined by induction. His proof, by piecing together partial functions, is the ancestor 
of many similar arguments, such as the one used in Sect. 6.3 to prove that well- 
ordered sets are isomorphic to ordinals. 

With these results, Dedekind went further than any of his contemporaries in 
building set-theoretic foundations for mathematics. But in one respect he went too 
far—in his Theorem 66: There exist infinite systems. Dedekind argued that the realm 
S of his own thoughts is infinite, because for any thought s there is the thought 


y(s) = “s can be thought”. 


Since not every thought is of the form ¢(s), y is a bijection between S and a proper 
subset of itself. Hence S is infinite. QED. 

One problem with this argument, of course, is that S is not well-defined by 
mathematical standards. A deeper problem, which Dedekind did not foresee, is that 
even well-defined properties may not define sets, as mathematicians were about to 
learn in the 1890s. 

As we saw in Sect. 3.8, in 1891 Cantor discovered that any set has more subsets 
than elements, so there is no largest set. Cantor was pleased with this discovery, 
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because it put his 1883 belief in the ever-growing scale of ordinal numbers on 
a sound basis. But it was bad news for mathematicians who thought that every 
property determines a set. With no largest set, there is no “set of all sets,” and hence 
there is no set corresponding to the property of being a set. Dedekind was disturbed 
by Cantor’s discovery, to the extent that he withdrew a new edition of Was sind und 
was sollen die Zahlen? that was due to be published in 1903. (See Ferreirdés 1999, 
p. 296.) 

Cantor was not disturbed; in fact he tried to profit from the related result 
that there is no set Q of all ordinals. He hoped to use this fact to prove the 
well-ordering theorem that every set can be well-ordered. His erroneous (and 
unpublished) argument is described in Ferreirés (1999), p. 295. Suppose, for the 
sake of contradiction, that V is a set with no well-ordering. It seemed to Cantor that 
V must then be so large that any ordinal, and hence Q itself, can be mapped into V. 
But in that case V is a contradictory set like ©. 

Zermelo was the first to notice a flaw in the details of Cantor’s argument: an 
unconscious use of what we now call the axiom of choice when mapping ordinals 
into V. The axiom of choice had been used several times in set theory and analysis 
before this, as we will see in the next chapter. Zermelo was the first to bring it to 
light, and in Zermelo (1904) he proved that the well-ordering theorem is actually 
equivalent to the axiom of choice. Since 1904 the axiom has played an important 
role in set theory, as the principle underlying many results not provable from the ZF 
axioms alone. 

As mentioned in Sect. 6.5, most of the ZF axioms were proposed by Zermelo 
(1908). It is thought by some historians that Zermelo’s motive was to establish 
foundations for his proof of the well-ordering theorem, but his declared intention 
was to avoid paradoxes, such as the “set of all sets.’ The Zermelo axioms do this 
in a natural way by building sets cumulatively from the bottom up: starting from 
the empty set and generating all other sets by pairing, union, power set, and the 
“definable subset” axiom, which Zermelo called Ausseronderung (“cutting out’). 
Aussonderung asserts that, for any set X and any well-defined property P, the 
members of X with property P form a set. Thus, properties can define sets, but 
only as subsets of sets already defined. Because of this, the existence of an infinite 
set has to be an axiom—not a theorem as Dedekind had hoped. 

Zermelo set theory cannot prove the existence of sets of high rank. As Fraenkel 
(1922) observed, it cannot prove the existence of the set 


{N, PWN), PAP(N)), .- J, 
which is the range of the function f defined on N by 
FI=N, f+ 1) =P(f@). 
This is one of the reasons why Fraenkel strengthened the Zermelo axioms with 
the replacement axiom. The schema of the replacement axiom generalizes Aus- 


sonderung, while still being in the spirit of building all sets from those previously 
constructed. 
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Fig. 6.3 Ernst Zermelo, Abraham Fraenkel, and John von Neumann 


The paper of von Neumann (1923) helped to popularize the ZF system with his 
elegant definition of ordinals, and proofs of the basic results about them. These 
included the theorem that every well-ordered set is isomorphic to an ordinal and the 
transfinite generalization of Dedekind’s theorem on definition by induction. Finally, 
von Neumann (1929) cemented the relationship between ZF and the cumulative set 
concept by using the foundation axiom to prove that every set belongs to some Vj. 

The picture of John von Neumann in Fig. 6.3 is a 1925 photograph from the 
John von Neumann Collection, Archives of American Mathematics at SRH (item 
e_math_00134 from Box 4RM51). It is at the Dolph Briscoe Center for American 
History in Austin, Texas, and is used with their permission. 


Chapter 7 
The Axiom of Choice 


PREVIEW 


The ZF axioms allow us to assert the existence of any set whose members 
are selected according to some definable “rule’—this is essentially what the 
replacement schema says. However, we often want to assert the existence of a 
set without knowing a rule for selecting its members. Typically, the members are 
simply “chosen” from other sets, but not according to any “rule.” When infinitely 
many choices are required, we may not be able to guarantee the existence of the set 
without some axiom of choice. 

The full axiom of choice (AC), described in Sect. 7.1, is a powerful axiom that 
greatly simplifies set theory. In particular, it implies that any set can be well-ordered, 
so that methods such as induction—previously applicable only to countable sets— 
apply to all sets. 

On the other hand, AC also has some negative consequences, inasmuch as it 
implies the existence of sets with irregular or even bizarre properties. We give one 
example of an irregular property (an undetermined set) in Sect. 7.6, and further 
examples occur in Chap. 9. 

For this reason it is also of interest to consider weaker axioms of choice, 
with consequences that are entirely positive. One of these, the countable axiom 
of choice (countable AC), is of particular value in analysis, because it is strong 
enough to prove some desirable properties of sets and functions, but too weak 
to admit the bizarre consequences of full AC. To illustrate what we mean by 
“positive” consequences of choice, we begin this chapter with some applications 
of countable AC. 


7.1 Some Naive Questions About Infinity 


In the early days of set theory the following questions arose and, seemingly, were 
easily answered. 
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1. Does every infinite set have a countably infinite subset? 
The naive answer is yes, because if S is infinite we can remove an element 
s; from S$, and S — {s,} is still infinite. Then we can remove an element sz from 
S —{s;} and S — {s,, 52} is still infinite; and so on. Proceeding in this way, we can 
remove an infinite sequence 51, 52, 53,... from S, so {51, 52, 53,...} is a countably 
infinite subset of S. 
2. Is a countable union of countable sets countable ? 
Again, the naive answer is yes. If {S1,52,53,...} is a countable set of 
countable sets, let 


S1 ={511, 812, 513,...} 
So = {521, 822, $23,...} 


S3 = (531, 832, 533,...} 


Then we can enumerate the members s;; of the union of these sets S; in the same 
way that we enumerate the members (i, j) of N x N, namely: 


S; US2US3 U+++ = (811, 521, 512, 531, 522, 513, -.-}. 
Hence $; US2 US3U--- is countable. 
3. Ifa function f is sequentially continuous at x, is f continuous at x? 
We call a function f sequentially continuous at x if f(x;) > f(x) for every 
sequence (xX), x2,.x3,...) such that x; — x. As we know from Sect. 4.2, f is 
continuous at x if, for each e > O there is a 6 > O such that 


Ix’ -— x1 < 6 > |f(")- f(| <e. 


So, given €, we want to use sequential continuity to find a 6. Well, the alternative 
is that, for some éo, there is no 6. In this case we can find an x; with 


Ix; —x]< 1/2 and |f(xi) - f= €0 
then an x4, with 

Ix5-— x] < 1/4 and |f(x4)- fo) = 0, 
then an x with 

|x4-—x1< 1/8 and [f(x4)- f(x) = 0, 


and so on. We therefore have a sequence (x), x5, .x4,...) with x; — x but with 
f(x) » f(x), contrary to sequential continuity. 
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What these examples have in common is an infinite sequence of choices: in the 
first example we infinitely often choose a new member from the set S,, in the second 
we choose an enumeration of each set S$ ;, and in the third we choose infinitely many 
real numbers x;. This may seem like the proof of the Bolzano—Weierstrass theorem 
in Sect. 3.6, where we chose an infinite sequence of intervals J, € [0, 1], but there is 
an important difference. In the proof of the Bolzano—Weierstrass theorem we were 
able to define the sequence (I), In, 13, ...), by taking [,, to be the leftmost half of J,-1 
that contains infinitely many points of the given set § € [0, 1]. 

In the three examples above the sequence of choices comes with no apparent 
definition—one just has to believe that it exists. Around 1900, it gradually became 
clear that an axiom of choice should be built into set theory to support all cases 
where a set is claimed to exist by virtue of an infinite sequence of choices. This 
was done by Zermelo (1904), who in fact proposed a stronger axiom allowing any 
infinite set of choices. There are many ways to state Zermelo’s axiom of choice 
(AC), some of which we study later, but the most convenient to begin with is the 
following: 


Axiom of Choice. [f X is any set whose members are nonempty, then there exists a 
function F, called a choice function for X, such that F(x) € x for each x € X. 


Thus, the function F “chooses” a member F(x) from each member x of X. Here 
is how the axiom of choice is deployed in the three examples above. 


1. Let X be the set of all nonempty subsets of the given infinite set S , and let F be a 
choice function for X. Then we can define the sequence (51, 52, 53,...) inductively 
in terms of F: 
Sy = F(S), Sy = F(S — {51, S2,.--, Sn-1})- 
2. Let E(S;) be the set of all enumerations of the countably infinite set S ;, let 
X = {E(S1), E(S2), E(S3), .. -}, 
and let F be a choice function for X. Then we can define the enumeration 
{Si1, Si2, Si3,-..} Of S; as F(E(S;)) and complete the proof as before. 
3. Our assumption is that, in any open interval / that contains x there exists an x’ 
with | f(x’) — f(x)| = €o. We therefore have, for each n, a nonempty set 
Jn = (x! |x’ — x] < 1/2" and [f(’) — f(@)| > £0). 
We define 
X = (Ji, Jo, J3,.. J 


and let F be a choice function for X. Then x), = F(J,,) has the property 
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lx, —al< 1/2" and |f(x,)—f)| 2 20: 


as required for the proof. 


In the latter two cases we are using the so-called countable choice axiom, where 
choices are made from each member of a countable set. In the first case we use 
the so-called dependent choice axiom, where a sequence of choices is made, each 
dependent on the one before. Countable choice and dependent choice are the most 
common choice principles used in analysis. As is clear from the examples above, it 
is hard to do without these principles, and they seem natural and harmless. 

The full axiom of choice has some useful consequences, as we will see in the 
next section, but also some consequences that are not convenient for analysis, as 
we will see in the next chapter. For this reason, we will be careful to distinguish 
between the full axiom and weaker forms (such as dependent and countable choice) 
in this book. Our understanding of the real numbers turns out to depend very much 
upon the strength of choice principles we assume. 


Exercises 


Recall, from Sect. 3.9, Dedekind’s definition of an infinite set: S is infinite if there is a bijection 
f : S — T, where T is a proper subset of S. The following exercises show that this property is 
equivalent to the existence of a countably infinite subset of S. 


7.1.1 Show that, if s¢ S — T, then {s, f(s), f(f(s)), .. .} is a countably infinite subset of S. 
7.1.2 Show, conversely, that a countably infinite subset of S gives a bijection f : S — T, where 
T is a proper subset of S. 


In ZF set theory it is not provable that every infinite subset of R contains a countable subset. We 
are therefore free to explore the possibility of infinite subsets of R without countable subsets. This 
turns out to throw light on the difference between sequential continuity and ordinary continuity. 


7.1.3 Suppose that S Cc R is infinite but with no countable subset. Explain why this gives an 
infinite T c [0, 1] with no countable subset. 

7.1.4 Let T c [0,1] be the set mentioned in Exercise 7.1.3, and let ¢ be a limit point of T, given 
by the Bolzano—Weierstrass theorem. Explain why we can assume t ¢ T, without loss of 
generality. 

7.1.5 If t and T are as in Exercise 7.1.4, show that ¢ is not the limit of any sequence 
ty, fo,t,...€ T. 

7.1.6 Show the characteristic function of T is sequentially continuous at x = f, but not continuous 
there. 


7.2. The Full Axiom of Choice and Well-Ordering 


As we saw in the previous section, the axiom of choice allows us to make inductive 
constructions involving an infinite sequence of choices. So far, we have done this 
only for countable sequences, but there is nothing to stop us continuing through 
ordinal number stages until the task is complete. The most famous application of 
this idea is the following theorem of Zermelo (1904). 
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Well-ordering Theorem. Assuming the full axiom of choice, there is a bijection 
between any set X and an ordinal. 


Informal proof. The basic idea could not be simpler: repeatedly choose elements 
X0,X1,X2,... from X, assigning them ordinal numbers as subscripts. When all the 
ordinals less than a@ have been assigned, the next element chosen is assigned 
subscript a. For example, once we have chosen elements x, for all natural numbers 
n, the next element chosen (if any remain) is called x,,. 

Since any set of ordinals has a least upper bound a, we can continue assigning 
ordinals to members of X until X is exhausted. This gives a bijection between X 
and some ordinal a (the least upper bound of the ordinals assigned to members 
of X). 


Proof. Now we formalize the above idea with the help of a choice function F for 
the nonempty subsets of X. F enables us to define the following function g, mapping 
ordinals into X, by induction: 


gO) = xo = F(X), 
and if g(8) has been defined for 8 < a, let 
G(@) = Xq = F(X — {xg : B < a}). 


To see that each member of X equals xg for some ordinal 6, consider the 
members of X that are of the form g(8) = xg. These form a subset S of X (by 
the “Aussonderung” axiom), and hence 


{8 : g(B) is defined} = g"!(S) 


is a set of ordinals, by the replacement schema. But this set has an upper bound a, 
since any set of ordinals has a least upper bound. Hence g(a) = Xq is defined, unless 
the elements xg, for 8 < a, include all elements of X. 

Since g(q@) is not defined, by definition, it follows that S = X, and that g is a 
bijection between X and the ordinal a. oO 


This theorem is called the well-ordering theorem because it says that any set can 
be ordered like the ordinal numbers, which are well-ordered by the < relation as 
defined in Sect. 6.3: 


1. The relation < is a linear order: that is, 


for any ordinal a, a 4 a, 
for any two ordinals a and £, either a < £6 or B < a, 


for any three ordinals a, 8, and y, ifa <Band6<ythena<y. 


2. Any set of ordinals has a least member in the ordering by <. 
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Well-ordering carries over to any set X if we label its elements with ordinal number 
subscripts as in the above proof, and then order its elements by the relation < 
defined by 


Xa <xXpea< FBP. 


The relation < on X then inherits the well-ordering properties from the relation < on 
ordinal numbers. We have to use the new symbol < because the relation < may be 
entirely different from the ordinary < relation on X (if < makes sense on X at all). 

Indeed, the enormity of the well-ordering theorem first becomes apparent when 
we consider the case where X = R, the set of real numbers. The ordinary < relation 
on R is certainly a linear ordering, but it is definitely not a well-ordering, because 
many sets of real numbers do not have a least member under the relation <; for 
example, the set of real numbers > 0. Thus, the well-ordering < of R given by 
the well-ordering theorem, and ultimately by the axiom of choice, must be utterly 
different from the ordinary ordering <. Indeed it turns out that, unless we assume 
new axioms of set theory, it is impossible to define a well-ordering of R in the 
language of ZF set theory. The well-ordering of R given by the axiom of choice 
“Just exists’”—we cannot describe it. 

The elusiveness of well-ordering is symptomatic of the elusiveness of sets 
obtained from the axiom of choice. It cannot be less elusive, because well-ordering 
of every set in fact implies the axiom of choice. Thus, well-ordering of all sets is 
equivalent to the full axiom of choice. 


Well-ordering implies the axiom of choice. [f every set has a well-ordering, then 
every set has a choice function. 


Proof. Given a set X whose members are nonempty sets, we find a choice function 
for X as follows. Let Y be the union of all members x of X, and take a well-ordering 
< of Y. Then each x is a subset of Y, well-ordered by the relation <. So the function 
defined by 


F(x) = < -least member of x 


is a choice function for X. oO 


From now on we will often abbreviate the axiom of choice by AC. 


7.2.1 Cardinal Numbers 


The well-ordering theorem gives a simple way to define the cardinal number of 
each set, solving the problem we raised in Sect. 6.7. Assuming AC, each set is 
equinumerous with an ordinal by the well-ordering theorem, so we can make the 
definition 
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|X| = cardinal number of X = least ordinal equinumerous with X. 


For example, |N| = w and |{countable ordinals}| = w;. When talking about cardinal 
numbers, it is usual, following Cantor, to use the symbolism of alephs: No = o, 
; = w1, and so on. The aleph symbol N is the first letter of the Hebrew alphabet. 

This seemingly redundant notation is useful because there is a cardinal arith- 
metic (reflecting size) which is different from ordinal arithmetic (reflecting order). 
We can use the same symbols for arithmetic operations in both if we adopt the 
convention of using No,&),... in cardinal arithmetic and w,w),... in ordinal 
arithmetic. Here is an example that illustrates this usage. In ordinal arithmetic we 
have 


Wt+tW=W-2#0. 
But in cardinal arithmetic we want 
No + No = No, 


to reflect the fact that the union of two disjoint countably infinite sets is countably 
infinite. Another example is 


w:w=w #w but No-No=No, 


the latter reflecting the fact that N? is equinumerous with N. 

We are not particularly interested in cardinal arithmetic in this book, although 
many of the equinumerosity results in Sects. 3.1 and 3.3 can be interpreted as 
equations in cardinal arithmetic (see exercises). However, we occasionally take 
advantage of aleph notation to state results about cardinality more concisely. In 
particular, we use the symbol 2*° to denote the cardinal number of R and of 
P(N). This symbol is in keeping with the result from finite mathematics that an 
n-element set has 2” subsets. Using this symbol, we can express the uncountability 
of R by 


No < QXo. 


Exercises 


R is probably the simplest example of a set for which well-ordering is not provable in ZF. 
Consequently, many interesting properties of R are provable only by assuming some form of the 
axiom of choice. One such property is the existence of a Hamel basis—a basis for R as vector 
space over Q. In other words, a Hamel basis is a set H C R such that: 


1. Each x € R has the form x = r) x, +--+ +7rg¢x% for some x1,...,%% € Handry,...,m%€Q. 
2. For distinct x; € H andr; € Q, ryx, +--+ +x, = Oonly ifr) =---=~% =0. 
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7.2.1_ Deduce from the above properties that each x € R is uniquely expressible in the form 


X=rjxXj +--+ +7,x, for some x1,..., x, € H. (*) 
7.2.2 Assuming a well-ordering yo, y1,...,Yoa,... Of R, define a Hamel basis of R by transfinite 
induction. 


7.2.3 Given a Hamel basis ho, hi,...,hg,... of R, let 
h(x) = coefficient of ho in the unique expression (*) for x. 


Show that h(a + b) = h(a) + h(b) for each a,b € R, so h is an additive function, but h is 
discontinuous. 


Find examples from Sects. 3.1 and 3.3 that illustrate the following equations of cardinal 
arithmetic. 


7.2.4 Xo +1= No. 
7.2.5 280 4%, = 2%, 
7.2.6 NX? = 280, 


7.3 The Continuum Hypothesis 


After Cantor discovered that R is uncountable, in the 1870s, he began to investigate 
other uncountable sets of real numbers, such as the Cantor set. All of the examples 
he found were actually of the same cardinality as R, which led him to formulate 
the so-called continuum hypothesis. His first version of the continuum hypothesis, 
formulated in Cantor (1878), simply states that every uncountable set of real 
numbers has the same cardinality as R. 

Then in the 1880s he further developed his theory of ordinal numbers and well- 
ordered sets, and became convinced that every set can be well-ordered. In other 
words, he believed in the well-ordering theorem. He was not yet aware of any axiom 
(such as AC) that implies the well-ordering theorem; it is more likely that he simply 
wanted an orderly universe of sets, and this is hardly possible without the well- 
ordering theorem. 

Pursuing this train of thought further, it would be convenient if R could be well- 
ordered, and best of all if R had the smallest uncountable cardinality, &,. This was 
Cantor’s second version of the continuum hypothesis, formulated in Cantor (1883), 
and it is what is meant by the continuum hypothesis today. In terms of ordinal 
numbers, it is stated as follows: 


Continuum Hypothesis. There is a bijection between R and the least uncountable 
ordinal, w. 


In the language of cardinal arithmetic: 2*° = &,. 

This form of the continuum hypothesis has the advantage of making the 
cardinality of R as simple as possible, but how plausible is it? We reiterate what 
we said in the previous section: unless we assume new axioms of set theory, it is 
impossible to define a well-ordering of R in the language of ZF set theory. In the 
absence of plausible new axioms, AC can only guarantee the existence of a well- 
ordering of R; it cannot name any such ordering. 
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It so happens that there is an axiom, called the axiom of constructibility, which 
provides a definable well-ordering of every set and is consistent with the ZF axioms. 
This remarkable new axiom was introduced by Gédel (1939) and it gives a definition 
of each set in a language L£ that includes the symbols of the ZF language plus 
symbols for all the ordinals. The sets named by £ are called the constructible sets. 
The axiom of constructibility states that every set is constructible, and this axiom 
is consistent with ZF (roughly) because £ gives names for enough sets to satisfy 
the ZF axioms. Moreover, it is possible to define a well-ordering of all the formulas 
in £, and hence of all the constructible sets. It follows that each constructible set 
gets a well-ordering, since all of its members are constructible sets. This means that 
the universe L of constructible sets satisfies not only the ZF axioms but also the 
well-ordering theorem, and hence the axiom of choice. 

Moreover, it turns out (by no means obviously), that each real number in L is 
defined by a formula in £ involving only symbols for countable ordinals. It follows 
from this that the real numbers in L can be ordered in a sequence of length w,, and 
hence that L satisfies the continuum hypothesis. Thus, Z is a model of the ZF axioms, 
plus the axiom of choice and the continuum hypothesis. It follows that the latter two 
propositions are consistent with the axioms of ZF. This is how Gédel (1939) showed 
that the axiom of choice and the continuum hypothesis could not be disproved from 
the ZF axioms, though it remained to be seen whether they could be proved.! 

In any case, while a definable well-ordering may be good for the universe, it is not 
necessarily good for R. The definable well-ordering of R implied by the axiom of 
constructibility implies that there are definable subsets of R with bizarre properties, 
as we will see in this chapter and in Chap. 9. Also, the axiom of constructibility 
limits the size of sets that can exist, and modern set theory often requires sets larger 
than the axiom of constructibility will allow. For these reasons, mathematicians have 
not rushed to add the axiom of constructibility to the ZF axioms. It remains a delicate 
matter to decide how ZF should be strengthened to provide the clearest possible 
picture of R. In the remainder of this book we will study how our view of R depends 
upon which axioms are adopted. 


Exercises 


The following exercises explore part of Gédel’s proof that the continuum hypothesis is consistent 
with ZF: the cardinality of sets of names in the language L. 


7.4.1 Given an infinite countable ordinal y, and assuming that there are countably many symbols 
in the language of ZF, explain how to use the ordinals < y to encode the symbols of ZF plus 
symbols for all the ordinals < y. 


‘As mentioned in Sect. 6.8, Cohen (1963) showed that the axiom of choice and the continuum 
hypothesis cannot be proved from the ZF axioms. It follows that the axiom of constructibility is 
not provable in ZF either. 
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7.4.2 Assuming the encoding of symbols from Exercise 7.4.1, formulas in the language of ZF 
plus symbols for the ordinals < y become finite sequences (@,...,@,,) of ordinals less than 
y. Describe a well-ordering of such sequences and explain why its order type is countable. 

7.4.3 Now consider formulas in the language of ZF plus symbols for all countable ordinals. (As 
mentioned above, Gédel proved that each constructible real number can be defined by such 
a formula.) Deduce from Exercise 7.4.2 that the set of all such formulas can be well-ordered, 
with order type w). 


7.4 Filters and Ultrafilters 


No axiom of choice is needed to prove results about natural numbers, since the 
induction axiom ensures that N is well-ordered. The lowest level theorems for which 
an axiom of choice may be required are those about sets of natural numbers. In this 
section we will give an example—the existence of a nonprincipal ultrafilter over 
N—which also turns out to be interesting in analysis (see Chap. 9). The ultrafilter 
example is also interesting because it involves an uncountable infinity of choices. 
Such uses of AC can lead to strange results, and we will see in Chap. 9 that this is 
one of them. 

We will take filters and ultrafilters to be certain collections of subsets of N, 
though the definition applies to subsets of any set. 


Definition. A collection F of subsets is called a filter if 


1. 0¢F. 
2. IfAeF andA C Bthen Be Ff. 
3. If A,BeEF thenANnBe Ff. 


In other words, a filter is a collection of subsets that does not include the empty set 
and is closed under supersets and intersections. 


The reason for the name “filter” is that if a set A is “caught” in ¥, then so is any 
set B larger than A. Two important examples of filters on N are the following: 


¢ Foranyaeé N,F, = {A CN: ae A}isa filter. F, is called the principal filter 
generated by a. 
¢ The set of cofinite subsets of N, {X C N : N — X is finite}, is a filter. 


In both of these examples it is easy to check that conditions 1, 2, 3 for a filter are 
satisfied. The principal filter satisfies an additional condition that makes it what we 
call an ultrafilter: 


4. For each B CN, either BE fF orN-Be F. 


This raises the question: are there any nonprincipal ultrafilters over N? One way 
to answer this question would be to extend the filter of cofinite sets to an ultrafilter: 
there is obviously no a € N that belongs to all cofinite sets, so any ultrafilter 
containing the cofinite sets is not principal. In fact, we will show that any filter 
can be extended to an ultrafilter, by making uncountably many applications of the 
following result. 
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Filter Extension. [fF is a filter over N that is not an ultrafilter—so both A ¢ F 
andN- A ¢F for some A—then there is a filter H > F with A €H. 


Proof. We extend the set F to a set H in two stages that ensure closure of H under 
intersections and supersets. 


Stage1. Add to ¥ all sets of the form AN F, where F € ¥ . The resulting set 
G=F UANF: FEF} 


is closed under intersections, as one sees by forming the intersections of different 
kinds of members: if F),F2 € F then F} NM F7 = F ¢€ F by closure of F, 
(AN Fi)N Fp = AN (FO Fo) = ANF € G, and (AN Fi) N(AN Fo) = 
AN (F, 1 F2) € G likewise. 

Stage 2. Add all supersets B 2 AN F of the sets added at Stage 1. Since the 
supersets of each F € F were already in Ff, the resulting set 


H=F UBIANF: FEF} 


is closed under supersets. It is also closed under intersections, as we again see by 
cases. For example, if Bj 2 AN F, and By D Fo, then B} N By DAN (FN Fo), 
so B, M Bo is one of the supersets of an AN F, already included. 


Finally, we observe that the empty set 0 ¢ H because the elements of H are 
supersets of sets of the form F or AN F, where F € #. We know that each F # 0, 
because F is a filter. And if AM F = 0 then F C N — A, which implies N —- A € F 
(by closure under supersets), contrary to assumption. oO 


Filter extension leads us to believe that any filter F that is not an ultrafilter can 
be extended to an ultrafilter U/ by finding a set A such that A ¢ F andN-A ¢ F 
and extending ¥ to include it, then iterating this process “until no such sets remain.” 
This is the kind of infinite process that AC enables us to carry out. Each single step 
of extending ¥ to include A will be called extension of F by A. 


Extension to an Ultrafilter. Any filter over N is contained in an ultrafilter over N. 


Proof. Given a filter f over N we build an increasing sequence of filters Fy 
whose union is an ultrafilter. To define the Fy we use AC to obtain a well-ordering 
Ao, Aq,..-,Aq,..-, fora <A, of all the subsets of N. Then we let 


Fy =F 
eee € Fg or N — Ag € Fa 
Fos. = 


extension of Fa by Aq otherwise 


Fz = Ss F, for each limit ordinal f. 
y<B 
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It follows by filter extension that F,,, is a filter when F, is. It is also clear that 
Uyep Fy is a filter when each F, is: U,<g F, is closed under intersection and 
supersets because each F, is, and 0 ¢ U,<g Fy because 0 ¢ each Fy. 

Thus, it follows by transfinite induction that F, is a filter for each a, and F) is 
also a filter by the argument for limit ordinals. Finally, either Ag € F, or N—Ay € Fa 
for each a, by construction, so F is an ultrafilter that extends ¥ . Oo 


In Sect. 7.8 below we will give another proof that each filter extends to an 
ultrafilter, again using filter extension, but replacing the transfinite induction by 
another form of AC. 


Exercises 


7.4.1 Explain why ¥ ={A CN:0€A and 1 € A} isa filter but not an ultrafilter. 

7.4.2 Show that the filter of cofinite subsets of N is countable. 

7.4.3 Show that the complements N — F of the sets F in a filter ¥ form a Boolean algebra ideal; 
that is, a set J that is closed under unions and under intersections with arbitrary sets C N. If 
F is an ultrafilter, show that Z is a maximal ideal. 

7.4.4 Interpret a nonprincipal ultrafilter F over N as a 0-1 measure yp on P(N). That is, if we set 
(a) =1 for A € F and (A) = 0 for A ¢ F, show that we have a measure on all sets A CN 
which is additive in the sense that (A; U Az) = u(A;) + p(A2) for disjoint A;, A>. 


7.5 Games and Winning Strategies 


Our next application of AC is in the theory of infinite games. To put this result in 
context we should first say something about finite games with perfect information. 
We consider two-person games, in which players I and II move alternately and 
there is a bound on the length of complete sequences of moves (“plays” of the 
game). Such a game is called finite, and it is said to be with perfect information 
if each player knows all previous moves. Typical games without perfect information 
are card games, where a player does not initially know the cards another player 
has been dealt, and typical games with perfect information are tic-tac-toe and 
chess. 

In a two-person game with perfect information one of the players may have a 
winning strategy; that is, a rule for making moves that always leads to a win. In tic- 
tac-toe, neither player has a winning strategy, because the game can end in a draw. 
But if we change the rules so that (for example) a draw counts as a win for player I, 
then one of the players does have a winning strategy. This is just one instance of a 
remarkably general, yet simple, theorem: 


Winning strategy theorem. /f G is a finite two-person game with perfect informa- 
tion, in which every play ends in a win for one of the players, then either player I or 
player II has a winning strategy. 


Proof. Since the game G is finite, all possible plays of G can be captured as paths 
in a tree like that shown in Fig. 7.1. 
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Start 
1 2 
11 12 21 22 


Fig. 7.1 The tree of plays of a game 


The vertex Start represents the starting position of the game, the vertices 1, 2, 
... below it represent the positions that can reached by the first move (which is by I, 
the vertices 11, 12,... and 21, 22,... below these represent the positions that can 
be reached by the second move (which is by II), and so on. Since G is finite, there 
is a maximum value WN for the length of downward branches from the Start vertex. 

We now prove the existence of a winning strategy for all such games G, by 
induction on N. If N = | then the game ends in one move and, by the hypothesis of 
the theorem, every move leads to a win for either I or I. If any move leads to a win 
for I, then choosing that move is a winning strategy for I. If every move leads to a 
win for II, then letting I make any move is a winning strategy for II. This completes 
the base step of the induction. 

Now, for the induction step, suppose that either I or II has a winning strategy for 
any game of length < N. Among such games are the subgames of G whose starting 
positions are the vertices 1, 2, ...in the tree of plays of G. Thus, each of the latter 
games has a winning strategy for either I or II. 

But then I or II has a winning strategy for G itself. If any of the games with 
starting position 1, 2,... has a winning strategy for I, then I has winning strategy 
for G. It consists of making his first move into such a subgame, n say, and thereafter 
playing a winning strategy for the game n. If none of the games 1, 2, ...has a 
winning strategy for I, then they all have winning strategies for II, in which case 
II has a winning strategy for G. Namely, II plays a winning strategy for whichever 
game n that player I moves into. 

This completes the induction, and the proof of the theorem. oO 


Exercises 


A very short proof of the winning strategy theorem may be written down using quantifiers: Vx, 
meaning “for all x,” and 4x, meaning “there exists an x.” 
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7.5.1 If a),b1,a2,b2,...,Gn,b, denote the moves made alternately by players I and II in a game 
of length at most 2n, explain why the existence of a winning strategy for II is expressed by 
the formula. 


Vay Ab, ---Va, Ab, (a), b1,..., dy, by is a win for I) 


7.5.2 Explain why =VxP(x) is equivalent to dx-P(x), where P(x) is any proposition about x and 


eee 


— means “‘not.” Similarly explain why =4xP(x) is equivalent to Vx-P(x). 
7.5.3 Deduce from Exercise 7.5.2 that the formula saying that II does not have a winning strategy, 


AVa, Ab, ---Va, Ab, (a1, b1,..., dy, Dn is a win for II), 


is equivalent to a formula saying that I has a winning strategy. 


7.6 Infinite Games 


If we remove the restriction that all plays in a game have length bounded by some 
integer N, and if we allow each move to be chosen from some countable set, then 
the tree in Fig. 7.1 now represents all possible plays in a countably infinite game 
with perfect information. Each play in such a game is represented by an infinite 
sequence (a, bj, dz, bo,...), where a1, a2, a3, ... represent the successive moves by 
player I and b,, bo, b3,... represent the successive moves by player I]. The game 
itself is defined by a set X of such sequences; namely, those that represent a win for 
player I. We call this game Gy. Thus, in Gy player I tries to ensure that the sequence 
(a1, bj, dz, bz, ...) € X, while II tries to ensure that (a), by, do, bo,...) € X. 

Such games were first considered by Hugo Steinhaus in 1925 and he conjectured 
that, by analogy with finite games, for any set X one player has a winning strategy 
for Gx. A short while later, Banach and Mazur showed that the Steinhaus conjecture 
is false if we assume the axiom of choice. AC makes it possible to define a set X for 
which neither player has a winning strategy for the game Gy. Such a set X is called 
undetermined. 

The Banach—Mazur proof actually uses a set X € [0,1], and players I and II 
choose successive digits of a real number in [0,1]. It is similar, but slightly more 
convenient, to use NV = NN in place of [0,1], as we do here. In fact, any ultrafilter 
U c P(N) containing the cofinite filter gives a natural example of an undetermined 
set X Cc N, as we will see in the exercises. 


7.6.1 Strategies 


Before showing how AC gives an undetermined set, we need to define what a 
strategy is. In any game Gy, a play is a sequence (aj, b, dz, b2, a3, b3,...), where 
(a,42,a3,...) is the sequence played by I and (b;,b2,b3,...) is the sequence 
played by II. A strategy o is a positive integer-valued function defined on all finite 


7.6 Infinite Games 163 


sequences of positive integers, including the empty sequence (). Player I plays 
strategy o by making the moves 


a, = 0(()), 
a2 = o(Xb1)), 
a3 = o({b1, b2)), 


in response to the moves b, b2,... made by player II. Player II plays strategy o by 
making the moves 


by =a ((a1)), 
bz = o({a1, a2)), 


b3 = o((a1, a2, a3)), 


in response to the moves a1, do, a3,... made by player I. 

We let o « b denote the sequence (a,b, a2, b2,...) that results when I plays 
strategy o on the sequence b of moves made by player II. And we say that o is a 
winning strategy for lin the game Gy if 7 * b € A for all b € N. Similarly, we let 
a* o denote the sequence that results when II plays strategy o on the sequence a of 
moves made by player I. And we say that o is a winning strategy for Il in game G4 
ifaxo €A forallae NZ 

It is an easy exercise to show that the set of strategies o has the same cardinality 
as the set NV, namely 2%, We use this fact in the proof below to define a set X c N 
in ordinal-numbered stages a < 2%°, alongside an enumeration of strategies 0 for 
a < 2%, This, of course, assumes AC to obtain a well-ordering of 2*°. Also, by 
taking smallest ordinal equinumerous with 2*°, we can assume that the set of op for 
B < a has cardinality less than 2°. It will also be convenient to use AC to make 
choices at each of the infinitely many stages a. 


An undetermined set. There exists a set X C N for which neither player has a 
winning strategy for the game Gy. 


Proof. Let {oq : aw < 2%} be an enumeration of all strategies. Using this 
enumeration, we will inductively choose the members of disjoint subsets 
of N, 


A reason for writing o on different sides in the two notations is that in o * b we use o before 
seeing b, namely, on the empty sequence; and in a * 0 we use o after seeing (the first member 


of) a. 
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X={xe:a<2} and Y={ye:a< 2}, 


as explained below. Each xg will witness the fact that 7, is not a winning strategy 
for II in the game Gx, because we will arrange that a*7 = Xe € X forsomea € N. 
Each yq will witness the fact that 7, is not a winning strategy for I either, because 
we will arrange that v7) * b = ya ¢ X forsome be N. 

At stage 0 we make these arrangements, and keep x9 and yo in disjoint sets, by 
letting 


xo = any value of a * a9, 

yo = any value of a * b unequal to xo. 
Such a value yo exists because o> * b takes 2%o values as b runs through N, since b 
consists of all the even-numbered places in op « b. (Indeed, both 0 * b and ao take 
2X0 values for any fixed o, a fact we will rely on at stage a.) 


At stage @ less than 2*° values Xg, yg have yet been chosen, so enough values 
a* Oq and Gy * b remain to let 


any value of a * Gy not in {yg : B < a}, 


Xo 


Yq = any value of o, * b not in {xg : B < a}. 

It follows by induction on a that X and Y have no common member, and for each 
a < 20 we have witnesses to the fact that a, is not a winning strategy for either II 
or I in the game Gy. Since the o, exhaust all strategies, it follows that the set X is 
undetermined. oO 


Exercises 


7.6.1 Using the Cantor—-Schréder—Bernstein theorem, or otherwise, show that the set of strategies 
has the same cardinality as NV. 

7.6.2 Supposing we take A C [0, 1] and let I and II alternately choose decimal digits of a number 
in [0,1]. Show that II has a winning strategy for the game with A = Q. 

7.6.3 Find a similar example A Cc N. 

7.6.4 By imitating the above construction of an undetermined set above, or otherwise, show that 
there is an undetermined set X c [0,1] for the game where I and II alternately choose 
decimal digits of a number. 


The next group of exercises show that an undetermined set X is also obtainable from the 
theorem on ultrafilters in Sect. 7.4. Specifically, we use an ultrafilter U that extends the cofinite 
filter, the set of cofinite subsets of N. The set X is defined to be the set of sequences 

(X1, X25 X3,%4,...) EN 


such that 


Xp <X2.< x3 <x4<--- and [1, x1) U [%, x3) U [44, x5) U--- EU. 
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[>——_—__} [-——_) -————}} Aj 
1 ay by ay by 43 b3 
—— -—5 b——3 


Fig. 7.2. Two members of the ultrafilter U 


We suppose, for the sake of contradiction, that player I has a winning strategy o for Gx. That is, 
whatever increasing sequence (b, b>, b3,...) is played by II, the sequence 


(a1, bi, a2, bo, a3, b3,...) € X, 
when I plays strategy o. 


7.6.5 Deduce that the set A; = [1, a) U [b), az) U [b2, a3) U--- € U when I plays strategy o. 


Now consider the following sequence of numbers, also chosen with the help of the function o: 


by = o((a2)), 
b3 = o((a2, 43)), 


bg = o((a2, a3, 44)), 


7.6.6 Show that the play 
(a1, 42, b>, a3, b3, a4, b4,...) is also a win for I, 
and hence that the set Ay = [1,a,) U [ao, bz) U [a3, 3) U--- € U. 
Thus, we have engineered sets Aj, Az € U that look like Fig. 7.2. 
7.6.7 Use the fact that U is an ultrafilter to deduce that 
A, NA2 = [1,a1) € U, 
and hence that the complement of A; M Ao is a cofinite set not in U. 


This contradicts the assumption that U is an extension of the cofinite filter, so I does not have a 
winning strategy. We find a similar contradiction if player II has a winning strategy, hence neither 


player has a winning strategy for the game Gy. 


7.7 The Countable Axiom of Choice 


The three questions raised in Sect. 7.1 can all be answered by the following special 
case of the axiom of choice—the countable axiom of choice. We call it countable 
AC for short, since we denote the full axiom of choice by AC. 
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Countable AC. Any countable set {S,, Sz, 83, ...} of nonempty sets S, has a choice 
function; that is, a function f such that f(S,) € Sp, for each n. 


In question 2 we have to choose an enumeration of each set S,,, So we want a 
choice function for the set {S,, S2, S3,...}, where 


S,, = {enumerations of S$ ,}. 


In question 3 we choose a real number xj, with |x), —x| < 1/2” and | f(x/,)— f(x)| = €0, 
so we want a choice function for the set {S;, S2, S3, ...} where 


Sy = {x : |x’ — x| < 1/2” and | f(x.) — f(0)| = €0}. 


For question | it is not so clear what to do, because in the “obvious” solution each 
choice depends on the previous one. However, we can prescribe a suitable countable 
set {S;, Sy, S3,...} in advance by defining 


S,, = {n-element subsets of S'}. 


Since S is infinite, each S, is nonempty, so by countable AC we choose a set 
F(S,) = S, from each S,. Then the union of sets S, is infinite, and countable 
by question 2. 

Thus, countable AC is useful (and in fact necessary) to prove some basic 
theorems of analysis. However, the full axiom of choice, AC, is not necessary to 
prove the above theorems, and in fact AC causes some irregularities in the theory 
of R, as we will see in the next chapter. Therefore, it is of interest to explore other 
axioms, strong enough to imply countable AC for subsets of R, but less disruptive 
to the theory of R than AC. 

An interesting candidate for such an axiom is the following axiom of determi- 
nacy, AD. We state AD for subsets of N, or the set of irrationals in [0,1], but the 
corresponding statement for [0,1] or for R is equivalent. 


Axiom of determinacy. For any set X C N, either player I or player II has a 
winning strategy for the game Gx. 


AD implies countable AC for subsets of N. Given a countable set S = 
{S1,52,53,...}, where each S, C N, AD gives a choice function for S. 


Proof. Given a countable set {S1,S'2,53,...} of sets S, Cc N, consider the following 
game. If 
I plays (a), a2, a3,...,) EN 
and II plays (b,, b2, b3,...,) € N 


then II wins if and only if (b,, bo, b3,...) € Sa,. This game can be formulated as Gy 
for a certain X C N, namely 
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x = {(a1, D1, a2, bo, . i ) : (b1, bo, b3,. ¢ ) ¢ Say}: 


Therefore, AD says that either I or II has a winning strategy. 

Now player I does not have a winning strategy for this game, because after I 
plays a; player II can always win by playing some (by, b2, b3,...) in the nonempty 
set S,,. So player II has a winning strategy; that is, a function f which (among other 
things) for each a gives a (bj, b2, b3,...) € Sq,. In other words, a winning strategy 
for II gives a choice function for the sets $1,52,53,.... oO 


This surprising theorem tells us that, although AD is incompatible with full AC 
(by the previous section), it actually implies enough choice for some important 
applications to analysis. 


Exercises 


The countable AC is not provable in ZF. In fact ZF cannot prove its consequence that a countable 
union of countable sets is countable, or even the extreme special case that R is not a countable union 
of countable sets. Amazingly, it is consistent with ZF for R to be a countable union of countable 
sets. The need for at least countable AC in analysis is underlined by the bizarre consequences of 
assuming that R is a countable union of countable sets, which include: 


7.7.1 There are countably many sets of measure 0 whose union has measure 1. 

7.7.2. Every set S C Ris a countable union of countable sets. 

7.7.3 Every real function is a limit of limits of continuous functions. (Hint: First prove that any 
function with countably many nonzero values is a limit of continuous functions.) 


7.8 Zorn’s Lemma 


We constructed an ultrafilter by transfinite induction in Sect. 7.4 because induction 
and ordinals are a major theme in this book, and the ultrafilter construction is a 
natural application of them. However, it should be pointed out that many books 
construct ultrafilters by a different method, called Zorn’s lemma, which is useful 
for constructing many types of “maximal” objects. Briefly put, Zorn’s lemma is an 
axiom of choice for people who dislike ordinals. 


Zorn’s Lemma. Suppose that T is a set such that each linearly ordered subset S 
(under the relation of set inclusion)’ has an upper bound: that is, an X € T such 
that Y C X for each Y C S. ThenT has a maximal element: that is, a Z € T such 
that Z is not properly contained in any other member of T . 


3The usual statement of Zorn’s lemma does not restrict the ordering to be set inclusion. However, 
this is the only case we need, and there is really no loss of generality. 
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Before proving that Zorn’s lemma is equivalent to AC (or to the well-ordering 
theorem), we illustrate the use of Zorn’s lemma by a new proof that every filter 
extends to an ultrafilter. 


Extension to an Ultrafilter. Any filter over N is contained in an ultrafilter over N. 


Proof. Let TJ be the set of all filters over N. If S is a set of filters that is linearly 
ordered by set inclusion, then the union X of all filters in S is itself a filter: X is 
closed under intersection and superset because any members of X belong to some 
filter in S (this is where the linear ordering is important—we cannot have a pair of 
members of X that do not belong to a single filter in S), and 0 is not in X because 0 
is not in any member of S. 

Thus, 7 satisfies the condition of Zorn’s lemma, and hence 7 has a maximal 
element Z. In other words, Z is a filter over N that is not properly contained in any 
other filter over N. This implies that Z is an ultrafilter, otherwise it could be extended 
by the filter extension theorem. oO 


The main difference between this proof and the one given in the previous section 
is replacement of a transfinite repetition of the extension process by the single 
step of selecting a maximal element. This is typical of the way Zorn’s lemma 
works: it is able to hide a transfinite extension process and, not surprisingly, this 
is because a transfinite extension process is built into the proof of Zorn’s lemma 
itself. 


Equivalence Theorem. /n ZF, Zorn’s lemma is equivalent to AC. 


Proof. We first use AC to prove Zorn’s lemma. Suppose we are given a set J in 
which each linearly ordered subset S has an upper bound. AC gives a function f 
such that 


f(S) = an upper bound of S 
for each S C J that is linearly ordered by €. Moreover, we can stipulate that 
f(S) 2 each element of S 
if such an element exists. Using the function /, and transfinite induction, we define 
a linearly ordered sequence of sets Ag € ZF whose upper bound is necessarily 


maximal, namely, let 


Ao = any element of 7, 


Aa = f(T —{f®) : B < }). 


The A, form a set, by the replacement axiom, since a cannot exceed the cardinality 
of J. The set of Aj is linearly ordered, by transfinite induction. And its upper bound 
is maximal by definition of f, since f always chooses an element greater than any 
chosen earlier, if it can. 
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Conversely, if Zorn’s lemma holds, we can obtain a well-ordering of any set X 
(and hence AC) as follows. Consider the set J of all bijections between subsets of 
X and ordinals. Such a bijection, 


g:Y>a, 


is of course a set (of ordered pairs), and if gi C gz then g2 extends gi, from a subset 
Y, € X to a larger subset Y2 C X, by agreeing with g; on Y; and mapping the 
members of Y, — Y, to larger ordinals. 

So if we have a set S of these bijections, linearly ordered by inclusion, its union 
will itself be such a bijection, and hence an upper bound of S in 7. Thus, 7 satisfies 
the conditions of Zorn’s lemma. 

Zorn’s lemma then gives a maximal element of 7; that is, a bijection g between 
a subset Y € X and an ordinal a that cannot be extended. It follows that Y = X 
(because if x € X— Y we can extend g : Y > a by the ordered pair (x, @)) and hence 
g gives a well-ordering of X. oO 


Exercises 


Zorn’s lemma is often used to prove the existence of maximal objects in algebra. In the following 
exercises we assume that the reader is familiar with the concepts of vector space, ring, ideal, and 
algebraically closed field. 


7.8.1 Give another proof of the existence of a Hamel basis for R (Exercise 7.2.2) using Zorn’s 
lemma. 

7.8.2, More generally, use Zorn’s lemma to prove that each vector space has a basis. 

7.8.3 Use Zorn’s lemma to prove that each ideal in a ring has an extension to a maximal ideal. 

7.8.4 Use Zorn’s lemma to prove that each field has an algebraic closure. 


7.9 Historical Remarks 


AC was used unconsciously for about 30 years before its explicit statement by 
Zermelo (1904). Many such instances are described in the book Moore (1982). The 
first was the proof by Cantor in 1871 that sequential continuity at a point implies 
ordinary continuity, a result we saw in Sect. 7.1 to depend on countable AC. This 
theorem is attributed to Cantor by Heine (1872), p. 182. The second unconscious 
use of AC was an algebraic theorem of Dedekind (1877) about modules. 

In § 1 of his paper, Dedekind defines a module M to be a set of numbers 
(generally complex numbers) that is closed under addition and subtraction. He calls 
M a module because it leads to a notion of congruence modulo M; namely, for any 
numbers a and b, 


a=b (mod M) ea-beM. 
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It follows easily from this definition that the numbers are partitioned into disjoint 
congruence classes, where a and b belong to the same class if and only if a = b 
(mod M). In §2 of his paper, Dedekind claimed that there is a set S that includes 
exactly one member of each equivalence class. This seemingly obvious result, which 
is routine in algebra classes today, depends heavily on AC. 

To appreciate why, consider Q as a module in R. Then a and b belong to the same 
equivalence class if and only if a — b is rational, so the congruence classes are very 
easy to understand. But a set S with exactly one member from each equivalence 
class is obtainable only with the help of AC—countable AC does not suffice—and 
no explicit definition of S can be given. S even has the bizarre property of being 
nonmeasurable, as we will see in Sect. 9.6. Thus, there are hidden depths in the 
seemingly elementary idea of congruence classes. 

Dedekind’s definition of an infinite set also involves countable AC, as we saw in 
the exercises to Sect. 7.1, since it involves choosing a countable infinity of members 
from an infinite set. Bettazzi (1896) was perhaps the first to question whether it is 
valid to make an infinite sequence of choices, but his objection was forgotten. It 
was not raised again until Zermelo (1904) made AC explicit and cited Dedekind’s 
definition of infinite sets as an application. 

AC crystallized in Zermelo’s mind after an incident at the 1904 International 
Congress of Mathematicians in Heidelberg. At the Congress, in August, Julius 
KGnig presented an argument that there is no well-ordering of R. The next day, 
Zermelo found a mistake in K6nig’s reasoning, and began to work on the well- 
ordering problem himself. It became clear to him that AC was the key idea, and 
with it he obtained a proof of the well-ordering theorem on September 24, 1904. He 
sent it to Hilbert, who quickly arranged for its publication. 

The reaction to Zermelo’s proof was mostly hostile, even from mathematicians 
who had unconsciously used AC in their previous work. For example, Borel (1905) 
(writing on December 1, 1904) correctly identified the essence of Zermelo’s proof, 
but denied that his argument could be part of mathematics: 


Such reasoning seems to me to be no more justified than the following: “To well order a set 
M, it suffices to choose arbitrarily an element to which one assigns rank 1, then another to 
which one assigns rank 2, and so on transfinitely; that is, until all elements of M have been 
exhausted ...” But no mathematician would regard this reasoning as valid. 


Borel (1905), p. 195. 


Supporters of AC were fewer, and less eminent, yet they produced the two most 
lasting results of the immediate post-Zermelo years: the basis of R over Q due to 
Hamel (1905), and the nonmeasurable set of Vitali (1905). 

As mentioned in the exercises to Sect. 7.2, Hamel used his basis to produce a 
discontinuous real function f with the additive property: 


f(x+y) = f(x) + fy). 


The existence of such a function had been an open problem ever since Cauchy 
(1821), pp. 104-106, showed that the only continuous additive functions are those of 
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the form f(x) = ax for constant a. (Cauchy’s proof was outlined in Exercises 3.4.5 
and 3.4.6.) Cauchy’s theorem has implications for the theory of measure on the line 
or the plane, where the measure p(A U B) of disjoint sets A and B is supposed to be 
L(A) + p(B). If uw varies continuously with the endpoints c and d of an interval on 
the line, then py is necessarily a constant multiple of the usual length function |d — c| 
on intervals. Similarly, a continuous measure on subsets of the plane is a constant 
multiple of the usual area function on rectangles. 

Vitali (1905), as mentioned above, used the congruence classes of R to obtain a 
nonmeasurable set. We defer a full explanation of measurability until Chap. 9, but it 
is worth mentioning here that the founders of measure theory, Borel and Lebesgue, 
rejected AC. They tended to accept countable AC (or to use it unconsciously), 
however, because measure theory is not really possible without it. 

After a few years of hostility, support for AC began to grow. Steinitz (1910) used 
AC to prove that every field F has an algebraic closure F; that is, F 2 F and every 
polynomial equation with coefficients in F has a solution in F. This was the first 
of many results about “closed” or “maximal” algebraic structures that depend on 
AC, so algebraists became strong supporters of AC. Indeed, by the 1930s algebra 
was influencing the presentation of set theory by favoring maximal principles like 
Zorn’s lemma‘ over the equivalent principles of AC and the well-ordering theorem. 
One can see this in the book of Bourbaki (1939) which does not use ordinals at all, 
and mentions them only in an exercise. 

At the same time, set theorists investigated AC and discovered that it had 
many interesting and/or bizarre consequences. Some of the most interesting are 
nonmeasurable sets, but the undetermined sets of Sect. 7.6 are also of interest. The 
theorem on winning strategies for finite games that instigated this line of research is 
due to Zermelo (1913). The generalization to infinite games was entertained by the 
Polish mathematicians Steinhaus, Banach, and Mazur in the 1920s, as Steinhaus 
(1965) reported. But they dropped the idea after Banach and Mazur found that 
winning strategies do not always exist, if AC holds. The subject was revived in 
the 1960s by Steinhaus and Mycielski, who thought that AD could be a useful 
alternative to AC in analysis. Their confidence was borne out by Mycielski and 
Swierczkowski (1964), who proved that AD implies countable AC for sets of reals 
and that all subsets of R are measurable. 

The consequences of AC were sometimes bizarre, but they were never contra- 
dictory, and Gédel (1938) explained why. As described in Sect. 7.3, he defined the 
class of constructible sets, which satisfies all the axioms of ZF plus AC. It follows 
that no contradiction can arise from AC unless ZF itself is contradictory. 

So, the consequences of AC are not contradictory—but Gédel did not show that 
they are true. He could not, because Cohen (1963) showed that it is also consistent 
with ZF to assume that AC is false, even in some very specific instances. For 
example, Cohen showed that one can consistently assume that there is an infinite 


4Zorn’s lemma gets its name from Zorn (1935), but it is actually due to Kuratowski (1922). The 
name stuck after Bourbaki (1939) called it “Zorn’s theorem.” 
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Fig. 7.3, Kurt Godel and Paul Cohen 


set of real numbers with no countably infinite subset. Using Cohen’s methods, 
Feferman and Levy (1963) showed that it is even consistent to assume that R is 
a countable union of countable sets. 

The message of Gédel and Cohen’s results is that the ZF axioms are very far from 
complete. They fail to settle many questions about the real numbers that are settled 
(often in contrary ways) by AC and AD. Thus, new axioms are called for, but so far 
none as compelling as the ZF axioms have been proposed. AC has been the most 
popular new axiom, because it makes the universe more orderly from at least two 
points of view. For algebraists, AC brings complete or maximal structures, such as 
the algebraic closure of any field; for set theorists, AC makes every set well-ordered, 
hence any two sets are comparable in cardinality. However, AC is disruptive at the 
level of R, where it creates nonmeasurable and undetermined sets. At this level, AD 
has some advantages: it blocks nonmeasurable and undetermined sets, yet allows 
countable AC for sets of reals, which is enough AC for most of analysis. 


7.9.1 AC, AD, and the Natural Numbers 


Since AC and AD have a profound influence on the properties of the real numbers, 
one wonders whether they also affect theorems about the natural numbers. Could it 
be, for example, that every even number > 2 is the sum of two primes if AC holds, 
but not otherwise? Fortunately, no. If a theorem about natural numbers (involving 
only elementary concepts such as addition and multiplication) is provable with the 
help of AC, then it is also provable without AC. 
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The explanation of this fact lies in Gédel’s class L of constructible sets, 
mentioned in Sect. 7.3. The natural numbers (and other elementary concepts) have 
the same meaning in L as they do in the universe of all sets, so proving that a 
theorem T about natural numbers holds in L amounts to proving T outright. But, 
as mentioned in Sect. 7.3, AC is provable in L, so by proving T in L we can avoid 
assuming AC. (In contrast, the real numbers do not have the same meaning in 
L as in the universe of all sets, because it is possible that nonconstructible reals 
exist. Indeed, the existence of nonconstructible reals follows from the presence of 
certain large sets, which are generally believed to exist though admittedly that is not 
provable in ZF.) 

It is a similar, though more complicated, story for AD. If one applies Gédel’s set 
construction operations to the set R, one obtains the class L(R) of sets “constructible 
from R.” L(R) is again a model of the ZF axioms, and proving that a theorem T about 
natural numbers holds in L(R) amounts to proving T outright. Indeed, L(R) is just 
the same as L unless we assume the existence of sets large enough to imply the 
existence of nonconstructible reals. But if we assume the existence of sufficiently 
large sets a wonderful thing happens: AD can be proved to hold in L(R). It then 
follows that AD can be eliminated from the proof of any theorem about the natural 
numbers. 

The proof that AD holds in L(R) is a very difficult one due to Woodin in 
1985. Woodin’s proof was never published, but a proof was included in Martin and 
Steel (1989), along with other deep results on determinacy. Neeman (2010) is a 
recent paper entirely dedicated to the proof that AD holds in L(R), assuming that 
sufficiently large sets exist. 


Chapter 8 
Borel Sets 


PREVIEW 


The Borel sets may be described simply as those generated from the open sets by 
the operations of complementation and countable union. But one gains a clearer 
understanding of Borel sets by dividing their generation into stages numbered by 
countable ordinals, with X_ denoting the class of Borel sets generated by stage a. 

The foundation for the classification of Borel sets is the universal open set 
constructed in Sect. 5.6. In this chapter we work with subsets of N, the set of 
irrational numbers in [0,1], as we did in there. 

It turns out that the construction of a universal open set “propagates” through 
the Borel sets to give a universal LX, set for each a, and the diagonal argument then 
shows that XZ, includes Borel sets not in Xg for any 6 < a. Thus, the Borel sets are 
arranged in a hierarchy, with new sets appearing continually as a increases. 

The Borel sets do not exhaust all subsets of VV, since we can show that there are 
only 2%o Borel sets—as many as there are members of N or R. But they do show 
the scope of the countable union operation (important for the concept of measure 
explored in the next chapter), and the related operation of forming the limit of a 
sequence of functions. Indeed, the functions generated from continuous functions 
by taking limits also form a hierarchy, closely related to the Borel hierarchy, called 
the Baire hierarchy. 


8.1 Borel Sets 


The class 8 of Borel sets may be defined in two rather different ways: as the closure 
of the class of open sets under the operations of complement and countable union, 
or as the union of a sequence of classes (the first of which is the class of open sets) 
defined by transfinite induction. The “closure” definition is easier to state, so we 
consider it first. 
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Definition 1. 8 is the least set that includes all open sets and is closed under 
complement and union. That is, G is the intersection of all sets C P(N) that include 
all open sets and are closed under complement and countable union. 


To be more precise, 8 is the intersection of all sets S C P(N) with the 
properties 


(i) Each open subset of N belongs to S. 
di) IfX e SthenN-XeES. 
(iii) If X1, Xo, X3,... € S then (X; UX, UX3U-:-- JES. 


Defining a set by closure properties, as here, is typical in modern mathematics. 
However, the glibness of this definition hides a property of Borel sets we would 
like to see: their levels of complexity. Open sets are naturally viewed as the simplest 
Borel sets, and other Borel sets have a complexity that can be measured by the 
number of complements and countable unions needed to construct them. This 
number can be an arbitrary countable ordinal, as we will see in Sect. 8.4. This brings 
us to the second definition of the class 8 of Borel sets, where ordinals make their 
appearance. 


Definition 2. S is the union of the classes X, defined inductively as follows (along 
with classes II,) for all countable ordinals a. 


x, = {open subsets of NV} 
Il, ={N-X:X eX} 


ZX. = 4 countable unions of sets in J IIg 
B<a 


Il, ={N-X:XeXg}. 


It is not clear that all the countable ordinals are really needed in this definition, 
because it is not clear that each XZ, includes sets not in any Lg for 8 < a. This will 
be proved in Sect. 8.4. 

Notice that X, is defined in a way that avoids distinguishing between successor 
and limit ordinals a. This greatly assists some later constructions, though it makes 
the successor ordinal case look more complicated than necessary. In fact (see 
exercises below) 


Xp+1 = {countable unions of sets in Ig}. 


It is not particularly hard to show that Definition 1 and Definition 2 are 
equivalent, but the proof depends on countable AC. More specifically, it depends 
on the consequence of countable AC that a countable union of countable sets is 
countable. 


Equivalent definitions of Borel sets. Definitions 1 and 2 define the same class 
of sets. 
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Proof. The class Uge,, Xa from Definition 2 certainly includes the open sets, 
because they comprise the subset £). Next we show that Uo<w, Za is closed under 
complementation and countable unions. First, 


XelsaN-Xelly by definition of Mg 
=>N-XeX, foranya>£, 


since N — X is (trivially) a countable union of copies of itself. Thus, Ug<w, Za is 
closed under complements. 

Observe, as a by-product of this argument, that X € Lg implies X € Mg for any 
a > B. Applying this observation to any X, X2, X3,... € Ue<w, La, we find that each 
X; € some IT,,. By countable AC, the union y of the countably many countable sets 
Q1,Q@2,@3,...1S a countable ordinal, so 


X, UX, UV X3 Us € Dy 


by definition of XZ,4;. This shows that Uocy, Za is closed under countable 
unions, and completes the proof that U,<,, Xa has the closure properties stated 
in Definition 1. 

To show that Ug<w, Za is the least such set, it suffices to show that any set with 
these closure properties contains each X_. This is immediate from the inductive 
Definition 2, since the operations used in the definition are complementation and 
countable union. Oo 


Exercises 


8.1.1 Show that any countable subset of NV is in £2. Why does this show that X, has members not 
ind)? 

8.1.2. Prove by induction on a that Ly © X41, We S Wor, Wo © Lois, and Ly C Moy). 

8.1.3. Deduce that £,,; = {countable unions of sets in Ig}. 

8.1.4 Prove that a countable union of II, sets is X&j4,, and a countable intersection of Ly sets is 
No+1 : 

8.1.5 Let a, < a2 < a3 <--- bea fixed sequence of countable ordinals with limit 2. Show that 
any X € L, is of the form 


X =X, UX, UX3U-++ where X; € Ty. 


(Hint: Insert empty sets X; in the sequence X,, X2, X3,... where necessary.) 


8.2 Borel Sets and Continuous Functions 


In the previous section we defined the Borel subsets of NV. There are similar 
definitions for the Borel subsets of N7, N°,... and so on. The bottom level 4, 
consists of the open subsets of N*, and these are again countable unions of basic 
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open sets. The only difference is that a basic open set in N“ is the cartesian product 
of basic open sets in each factor N. For example, a basic open subset of N? is of the 
form J x J, where I and J are basic open subsets of N. J x J is a rectangle minus 
each point with a rational coordinate. 

We are particularly interested in the Borel subsets of N, because we wish to 
generalize the universal open set U c N* of Sect. 5.6 to a set U, C N* which is 
“universal L,” in the same sense. That is, U/, is a Ly subset of N’, and its sections 


Ualy) = {x : (x,y) € Ua} 


are all the X_ subsets of NV. 
To make this possible we generalize the basic property of continuous functions 
from open sets to La sets: 


Inverse images of Borel sets under continuous functions. For each countable 
ordinal a, and each continuous function f from N/ onto N*, f~! of aXq set X C N* 
is Xo, and f~! of a Wy set X C N* is My. 


Proof. We argue by induction on a. For a = 1, f~! of a Xp set is f~! of an open set, 
hence open; that is, £,. It follows that f~! of a II, set is H, by taking complements. 

For the induction step, suppose f~! of a Xg set is Lg, and f~! of a Mg set is Mg, 
for all 8 < a. By Definition 2 of Borel sets, X € Ly is of the form 


X=X,UX,UX3U---, 
where each of X1, X2,X3,... belongs to some IIg with 8 < a. Therefore 
FX) = fF) FG) Uf), 


and each term in the union is in some IIg, with 6 < a, by the induction hypothesis. 
This implies 


f \(X) €XZq by Definition 2. 


It then follows, by taking complements again, that f-' of a I, set is T,, and the 
induction is complete. oO 


The second tool for the construction of a universal Ly set is a “continuous 
encoding” of each sequence of irrational numbers (members of NV) by a single 
irrational number. Given this tool, if we can encode II, sets, for 8 < a, by members 
of N, then we can also encode their countable unions (the X_ sets) by members of 
N, in a continuous fashion. 


Continuous bijection g : N — N™. There is a function g, sending each y € N to 
a unique sequence (y\, Y2, Y3,...) € N™, such that each member of N™ is a value 
g(y), and each of the coordinate functions gx(y) = y, is continuous. 
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Proof. If y = (a1, a2, 43,44, d5,...), then one way to define gi, g2,g93,... is the 
following: 


gi ly) = (a1, 43, 45,a7,...), 
g2[y) = (a2, 46, 410, 414, .--), 


g3(y) = (a4, 412, A420, 428,.-.), 


In other words, gi(y) omits every other term of the sequence y; g2(y) omits every 
other term of the sequence omitted by gi; g3(y) omits every other term of the 
sequence omitted by g; and g2; and so on. 

It is clear that the sequence 


Gy) = (Y1, Y2,Y3,---) = (gi ly), 92), 93) - - -) 


is uniquely determined by y. Also, any sequence (yj, y2, y3,...) in N is obtainable 
for suitable choice of y, because y can in fact be assembled from (y1, yo, y3,...). 
Thus, g : N > NN is a bijection. 

Finally, each gx is continuous. Because we can make g;(y’) arbitrarily close to 
gx(y) by choosing y’ sufficiently close to y; that is, by making y’ agree with y on a 
sufficiently long initial segment (a), a2, a3,..., dk). oO 


This proof works because y is separated (or partitioned) into infinitely many 
disjoint subsequences. Any other partition of y works equally well: we still get a 
bijection y © (y1, y2, y3,...), and gx(y’) is arbitrarily close to gx(y) if y’ and y agree 
on a sufficiently long initial segment. Thus, y in a reasonable sense encodes the 
sequence (yi, y2, y3,...), and the encoding is continuous. 


Exercises 


8.2.1 Use the bijection p : N? > N from Sect. 3.2 to obtain another bijection VN > N’ m 
8.2.2 Show that the function g : N > N™ has a unique continuous extension g : [0, 1] — [0, 1. 
8.2.3. Is ga surjection? A bijection? 


8.3. Universal &, Sets 


With the two theorems of the previous section—preservation of the XZ, and I, 
properties by (inverses of) continuous functions, and the continuous encoding of 
sequences by single elements of W—we are now ready to extend the construction 
of a universal set from 2, to all levels of the Borel hierarchy. 
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Universal ©, set. For each countable ordinal a there is a Xq set Uy CN? whose 
sections 


Uoly) = {x : (x,y) € Uh 


are all the X» subsets of N. 


Proof. We argue by induction on a. For a = 1 we can take U/; to be the universal 
open set constructed in Sect. 5.6, since open sets are £1. 

For the induction step we suppose that for each 6 < a there is a Lg set Ug whose 
sections 


Uply) = {x : (x,y) € Up} 


are all the Xg subsets of NV. (More precisely, we have to choose a universal Lg set 
Ug for each of the countably many 6 < a, using countable AC.) It follows that 
N? - Ug is a universal II, set, because its sections are precisely the complements 
of the sections Ug(y); that is, the Hg subsets of N. 

Now to form XZ, sets we need to form countable unions of IIg sets for (possibly 
various) 8 < a. To do this uniformly we first choose a sequence £; < Bo < 83 <--- 
with limit a@ if @ is a limit ordinal and limit y if a = y+ 1. In either case, each X € Ly 
is of the form 


X = X; UX,UX3U--- where each X,, € Hg. 


By our induction hypothesis we have a universal Xg, set Ug, for each B,, of which 
X,, is the complement of a section Ug, (y,). That is, 


x EX & (x, yn) € U,. 
And therefore 
xEX & (x, Yn) Ug, for some n. 


We also know, from the previous section, that each sequence (yj, Yo, y3,...) OCCUTS 
as (gi(y), g2(y), 93(y), ...) for some y € N. For this y we therefore have 


xEX & (x,gn(y)) ¢Ug, for some n. 


In this sense y “encodes” the Ly set X, and we have a “universal” set Uy 
defined by 


(x,y) € Ua © (x, gn{y)) € Up, for some n, 
because all of its sections Uy(y) = {x : (x, gn(y)) € Up, for some n} are Ly, and 


they include all the X, sets X. Thus, it remains only to prove that YU, is itself a 
Ly set. 
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It suffices to show that Uy is a countable union of Hg, sets, and indeed that, for 
each n, 


{(x,y) 24% 9n(y)) €Up,} is T,,. 


We show, equivalently, that its complement 


Vn = {(x,y) 2 (x 9n(y)) © Up} is Xy,. 


To do this we recall that g, is continuous, and hence so is the function (x,y) # 
(x, Gn(y)). Vn is the inverse image of the Lg, set Us, under the latter function. So 
V,, is also Xg,, as required, by the first theorem of the previous section. oO 


Exercises 


8.3.1 Explain why the sets U,(y) = {x : (x, gnly)) € Up, for some n} are Ly. 

8.3.2 Re-prove the induction step in the special case where a = y + 1. (In this case we can assume 
that X € X,,; has the form X = X, U Xz U X3 U--- where each X,, € IL, so there is no need 
to use a sequence 6) < B21 < B3 <-:-.) 


As noted in the second paragraph of the proof, we are using countable AC. Indeed we cannot prove 
that the Borel hierarchy extends beyond L3 without using countable AC. This is due to the result, 
mentioned in the exercises to Sect. 7.7, that it is consistent with ZF to assume that R is a countable 
union of countable sets. 


8.3.3 Under this assumption, show that all sets of real numbers are in X3. 


8.4 The Borel Hierarchy 


We are now ready to show that each level Ly of the Borel hierarchy has members 
not in Lg for any 6 < a. We know that 


1. Ly includes each member X of IIg for any £ < a (as the union of countably many 
copies of X). 


So if we can prove that 

2. IL, has a member Y not in XQ, 

then it will follow that 

3. Il, has a member Y not in Ig for any 8 < a. 
Therefore (taking complements), 


4, LZ, has amember N — ¥Y not in Lg for any 6 < a. 
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Borel hierarchy theorem. I, has a member not in Xq (and therefore, as explained 
above, Xq has a member not in Xz for any B < @). 


Proof. We apply the diagonal argument to the universal XL, set U/, from the previous 
section. By construction, the sections of Ung, Ualy) = {x : (x,y) € Ug}, are all the 
Xq subsets of NV. But the “complementary diagonal set” Dy = {x : (x, x) ¢ Ug} is 
unequal to each section U,(y). In fact, 


ye Da & ty, y) €Ue & y E Udly), 


so Dy differs from U,(y) regarding the element y. Thus, Dy is not in Ly. 

But Dg = {x : (x, x) € Ug} is in Ho, as we prove by induction on a. 

For the base step, a = 1, we observe that the open set U/; is a countable union 
of open rectangles, so it meets the diagonal {(x, x) : x € N} in a countable union of 
open intervals. Then its projection {x : (x, x) € UU} on the x-axis is also a countable 
union of intervals, hence Z,. The complement D, of this projection is therefore . 

Notice that this argument applies with any X set V, in place of UY. 

For the induction step our hypothesis is that, for each B < a, 


{x : (x, x) € Vp} is Ig for any Lg set Vg, or equivalently 
{x : (x, x) € We} is II, for any Ig set We. 


Now consider Dy = {x : (x, x) € Uy}. We know Uy = W, UW? U--- , where each 
Wi; € Ig, for some f; < a, by definition of Ly. So 


Da = {x2 (x, x) € Ug} = N - (x: (x, x) © Ug} 


=N-{x:lyxe LJwa 


i=1 


=N-| Jiri ew) 


i=l 
=N- [_J(some TIg, set) by induction 
i=l 
= N — (some X, set) since 8; < a, 
which is in II,. This completes the induction, so Dy = {x : (x, x) € Ug} is a Mg set 
not in Xp, as required. Oo 


The Borel hierarchy theorem gives content to the following definition, which 
captures the concept of “complexity” of Borel sets S. 


Definition. The least a such that S € Ly is called the Borel rank of S. 


By the hierarchy theorem, all countable ordinals occur as Borel ranks. 
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Exercises 


Most “natural” examples of Borel sets occur at quite low levels of the hierarchy. An example is the 
set of normal numbers, where a number x is called normal (in base 2) if the digits 0 and 1 occur 
with equal frequency in the binary expansion of x. In other words: if there are m occurrences of the 
digit 1 in the first n binary digits of x, then m/n > 1/2 asn > ov. 


8.4.1 By formalizing the above statement about limits, show that 
x is normal © for all ¢ > 0 there is an N such that 


m 
-e<—<-=+e foralln>N, 
n 


where m = number of occurrences of | in the first n digits of x. 
Now we translate this statement about digits into a statement about sets. 


8.4.2 Explain why, if [0,1] is divided into 2” equal subintervals /,, the numbers x in I, NM N agree 
in their first n binary digits. (So it makes sense to speak of “the number of occurrences, m, 
of | in the first n digits” in J.) 

8.4.3 For each ¢ > 0 and positive integer n let U,,, be the union of the intervals J, for which 


Explain why U,., is a L, set. 
8.4.4 Explain why 


xis normal © for alle, x€ Ss () Usn- 
N n>N 


Finally, we restrict the values of ¢ to e = 1/M, for positive integers M. 


8.4.5 Explain why 


xis normal © x€ () Ss () Uiyman> 


M WN n>N 


and deduce that the set of normal numbers is II4. (A more refined argument actually shows 
that the set is IT,.) 


8.5 Baire Functions 


After this long trek into the wilderness of Borel sets, we are in a better position 
to understand the interaction between two basic concepts of analysis: continuity 
and limits. We know from Sect. 4.6 that a uniformly convergent sequence of 
continuous functions has a continuous limit, so uniformly convergent sequences 
lead to nothing new. But, as we also saw in Sect. 4.6, a merely convergent 
sequence of continuous functions may have a discontinuous limit. An example is the 
function 
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Lifx=0 
far={ oie ca 


which is the limit of continuous functions with a “spike” at x = 0 (see the picture in 
Sect. 4.6). 

By repeatedly taking limits of previously defined functions we obtain an 
increasing sequence of function classes By called the Baire hierarchy. The classes 
B, are defined for all countable ordinals @ as follows. 


Bo = {continuous functions R — R} 


By = {limits of convergent sequences of functions from the Bg with 6 < a.} 


In this section we will only sketch some basic results on Baire functions, since we 
do not need the results later. For this purpose we can stick with the domain R, since 
the domain N becomes convenient (as with Borel sets) only when more technical 
details are required. In R we have the rational numbers, which are involved in some 
of the most interesting Baire functions in the low levels of the hierarchy. 

For example, the discontinuous function f(x) defined above is a member of B,, 
and so is the function f(x, r) defined for each rational number r by 


lifx=r 
iGO 


If we then take an enumeration 7, r2,73,... of the rational numbers the functions 


Gnlx) = fxr) +++ + fn) 


are also in B,, because each g, is the limit of a sequence of continuous functions 
with “spikes” at 7),...,7,. The limit of the sequence g1, g2, g3,... exists and equals 


1 if x is rational 
GO) 4 ie a as 
0 if x is irrational. 


Thus, the highly discontinuous Dirichlet function of Sect. 1.5 is in By. This is 
what we meant when we said that the Dirichlet function is “not far removed” from 
continuity. 

It is easy to guess that there is a connection between Baire functions and Borel 
sets, and the connection becomes clearer when continuity and limits are expressed 
in terms of sets. The key facts are the following. 


1. As we already know from Sect. 5.2, a function f is continuous if and only if 
f-'(U) is open for any open set U. Thus, f~! of a Z; set is Z;, when f is 
continuous. 

2. If f(x) = limp fn(x) and U is open, then 
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xe f l\(U) © f(x)eU 
© f.(x) € U for all sufficiently large n 
© there is an m such that f,(x) € U for alln >m 


© there is an m such that x € f,'(U) for alln >m 


© there is an m such that x € () f, |) 


Oxe UJ (lw: 


m n>m 


Thus, f~!(U) is a countable union of countable intersections of the sets f;!(U). 


These two facts enable an inductive proof that for any Baire function f, f~' of 
an open set is in Xg, for some 8. A more careful proof (see, e.g., Kechris (1995), 
p. 190) shows in fact that f~! of an open set is in £,4; when f is in By. 

Conversely, one can show that every Borel set arises as f~' of an open set for 
some Baire function f. In fact, one can show that the characteristic function v4 of 
each Borel set A is Baire, where 


_ { lifxeA 

ARN) Of oe. 
For example, the characteristic function of the X» set Q is the Dirichlet function, 
which we have shown to be in Bo. To prove that the characteristic function of any 
Borel set is Baire, we use induction on the construction of Borel sets. The base 
step is to show that the characteristic function of any open interval (a, b) is in By 
(exercise). 

For the induction step, we use two closure properties of Baire functions that are 
easy to prove (see exercises): 


Closure Under Sums. If f and g are Baire then so is f + g. 
Closure Under Composites. If f and g are Baire, then so is f © g, defined by 


(fo g(x) = f(g@)). 


Assuming that these closure properties hold, first suppose that A is a Borel set whose 
complement NV — A has a Baire characteristic function. Then y,4 is the composite of 
Xwn-a With the continuous function 


g(x) =1- x, 
which exchanges the values 0 and 1. The function g is certainly Baire, hence so is 
XA: 


Finally, suppose that A is the union 


A=A,UA2UA3U::-, 
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where each of the characteristic functions V4,,¥4,,VA;»---18 Baire. By closure under 
sums, each of the functions 


Su(X) = XA, OX) ++ + H4, (1) 
is also Baire. Clearly, 


some positive integer if x € Ay U--- UA, 
fnlx) = ‘ 
0 otherwise. 


So if we compose f, with the Baire function 


lifx>0 
h(x) = 
@) fe otherwise, 


then we get Ya,u..ua,. Finally, va is Baire, as the limit of the Baire functions 
XNA,UsUAn* 

Thus, we have an inductive proof that v4 is Baire for any Borel set A, and hence 
that every Borel set has the form f~'(U) for some open set U and Baire function f 
(taking U to be a small open set that includes 1). 

Combining this result with the previous result that f € By gives Borel sets f-!(U) 
of bounded Borel rank (in fact, in X,.;), we conclude that no B, includes all Baire 
functions. Thus, new Baire functions occur in B, for arbitrarily large values of a, 
which means that the Baire classes B, form a true hierarchy, like the Borel hierarchy. 


Exercises 


8.5.1 Show that Uo<., Ba is the least class of functions that includes the continuous functions 
and is closed under limits. 

8.5.2. Show by induction on a that if f,g € By then f+g€ By and fogé€ By. 

8.5.3 Show that the characteristic function of an interval (a, b) is the limit of continuous functions. 

8.5.4 Show that any open set is a countable union of disjoint intervals, so that its characteristic 
function is also a limit of continuous functions. 

8.5.5 Explain why the following formula defines the Dirichlet function 


lim lim ((cos m!2x)"). 


moo noo 


8.5.6 Use Exercise 8.5.5 to give an immediate proof that the Dirichlet function is in Baire class 2. 
8.5.7 The formula of Pringsheim (1899), p. 7, has n in place of 2n. Why is his formula also valid? 


8.6 The Number of Borel Sets 


There are at least 2% Borel sets, because there are that many irrational numbers in 
N, and hence that many open intervals in N. It follows that there are 2%° sets in 
each Xy, because these are the sets U_(y), as y varies over NV. The encoding of Xp 
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xX X> X3 


Fig. 8.1 Top portion of the tree for a Borel set X 


sets by y values shows that there are at most 2% sets in Zp, and there are exactly 
this many because X, includes all the open intervals. However, there are w; values 
of @, so it remains unclear how many sets there are in Jy<., Za: 

There are in fact 2% Borel sets in total, and to show this we consider all Borel 
sets at once, encoding each of them by a tree. 

A Borel set X € Yq is naturally described by a tree, the top vertex of which 
represents X. Connected to this top vertex are vertices representing the sets X; € Hg,, 
where £; < a, such that 


X= X,UX,UX3U-:-. 


Thus, the top two levels of the tree are as shown in Fig. 8.1. We then connect the 
vertex for X; € Ig, to a vertex below it for NW — X; € %g,, and the latter vertex 
to vertices for the countable many sets (from IL,, with y; < £;) whose union is 
N — X;, and so on. Each branch of the tree corresponds to a descending sequence of 
ordinals a > 6; > y; > ++: , and hence each branch is finite. Moreover, each branch 
terminates in a basic open set G,,, which we can encode by the natural number n. 

The set X is determined by the shape of its tree and the natural numbers 
n associated with its terminal vertices. Thus, the problem of encoding Borel 
sets amounts to describing the shapes of trees with finite branches and at most 
countably many descendants of each vertex, and labelling their terminal vertices 
with natural numbers. The means to do this are close at hand; namely, finite 
sequences (a1, d2,..., a,x) of natural numbers. 

We simply interpret (a;,d2,...,a,%) as the branch with successive vertices 
a, d2,...,dx, Where a, is the label on the terminal vertex and the other vertices 
are labelled in any way that correctly describes the shape of the tree. For example, 
it would be natural to label the top vertex by 1, and the vertices immediately below 
it by 1,2,3,... Gif these vertices are not terminal). It follows that any Borel set can 
be encoded by a subset of the set N<° of all finite sequences of natural numbers. We 
saw in Sect. 3.1 (Example 8) that N<° is countable, so there are as many Borel sets 
as there are subsets of a countable set, namely QNo, 


Exercises 


8.6.1 Draw the tree for the open set Gz U G4 U Ge U--- and describe it by a set of ordered pairs 
of natural numbers. 
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Fig. 8.2. Tree representing w + 1 


Trees with branches of finite length and vertices of countable degree are also a useful way to 
encode countable ordinals. Each vertex is labelled by a countable ordinal a which (if the vertex 
is not terminal) is the least ordinal greater than the labels on the vertices below it. The terminal 
vertices are labelled by finite ordinals. 

For example, Fig. 8.2 shows a tree representing the ordinal w + 1. 


8.6.2 Draw a tree representing the ordinal w - 2. 
8.6.3 Prove by induction that every countable ordinal is representable by a tree with branches of 
finite length and vertices of countable degree. 


8.7. Historical Remarks 


The Borel sets get their name because of Borel (1898), where they are sketchily 
introduced on pp. 46-47 as a class of sets for which the concept of measure is 
meaningful. Given that the values of a measure are non-negative real numbers, and 
because we can form differences and infinite sums of real numbers, there are two 
arithmetic conditions that measurable subsets of [0,1] should satisfy: 


Subtractivity. If E and E’ are measurable, with measures (F) and u(E’), and if 
E’ CE, then 


u(E — BE’) = w(E) - WE’). 


Countable Additivity. If S\,S2,53,...are disjoint sets with the measures p(S 1), 
M(S'2), (S33), ..., then 


MS, US2US3U++-) = WS 1) + W(S2) + W(S3) +-°-. 


Thus, implicitly, Borel is considering a class of sets S € [0, 1] that is closed under 
the operations of difference and countable disjoint union. He does not mention the 
“base” sets of this class, but presumably they are sets whose measure is obvious, 
such as open or closed intervals. 

It so happens that the Borel sets in [0,1] can be generated from intervals by 
differences and countable disjoint unions. However, this result is not obvious and 
was apparently first published by Sierpinski (1927). The present definition of Borel 
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Fig. 8.3 Henri Lebesgue 


Fig. 8.4 René Baire and 
Emile Borel 


sets, using unrestricted countable unions, seems to have been used since 1902. This 
was when the concept of measure reached its mature form, in the thesis of Lebesgue 
(1902). Here Lebesgue introduced the concept of Lebesgue measure and proved its 
basic properties, including countable additivity. We will say more about Lebesgue 
measure, and why its scope includes all Borel sets, in Chap. 9. 

The Baire classes of functions were introduced by Baire (1899), who observed 
that the Dirichlet function is in Baire class 2. He did not investigate higher levels 
of the Baire hierarchy, let alone prove that new functions appear at each level. This 
was first done by Lebesgue (1905), who also proved that new sets appear at each 
level of the Borel hierarchy. 

It is noteworthy that Borel, Baire, and Lebesgue were all skeptical about AC, 
even though they unconsciously used it (or at least countable AC) in their work. 
They were among a large group of mathematicians who became painfully aware 
of AC in 1904, after Zermelo published his proof of the well-ordering theorem. 
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Moore (1982) has an interesting account of this episode, including translations of 
1905 correspondence between Borel, Baire, Lebesgue, and Hadamard (the “elder 
statesman” of the group and the only one to support AC). 

Borel, Baire, and Lebesgue were interested in Borel sets and Baire functions 
because these objects are “definable by formulas” in some sense, and hence linked to 
classical analysis. We have seen for example, that the Dirichlet function is definable 
by the formula 


Jim lim ((cos m!xx)*"). 
They were opposed to AC because it produces sets and functions with no apparent 
definitions. But with the realization that many results about Borel sets and Baire 
functions depend on countable AC came the realization that “definability” is not 
an absolutely clear concept. In retrospect, this was not surprising, because their 
“definitions” included series with infinitely many real number coefficients. In fact, 
“definability” is best confined to finite formulas and it is always a relative notion: 
one can speak of definability only in a given language, the symbols and syntax 
of which must be completely specified. Such formal languages were not used in 
mathematics until the 1920s. One such language is for ZF, where all definitions of 
functions must be available in order to state the replacement schema. 

The discovery of the hierarchical properties of Borel sets by Lebesgue (1905) 
revealed, for the first time, a well-defined notion of complexity for sets of reals; 
namely, Borel rank. Set theorists could think about extending results known for 
open or closed sets to more complex sets in a systematic way. One such result was 
Cantor’s theorem that every closed set F has the perfect set property: if uncountable, 
F contains a perfect subset (in which case F has the same cardinality as R). 

Hausdorff (1914), p. 466, mentioned that the perfect set property holds for sets 
with Borel rank < 4, and remarked that there seemed hope of proving it for all Borel 
sets. As mentioned in Sect. 5.7, he did exactly that in Hausdorff (1916), and the 
same result was proved independently by Alexandrov (1916). From this theorem 
of Hausdorff and Alexandrov it follows that the continuum hypothesis holds for all 
Borel sets: if uncountable, a Borel set has the same cardinality as R. 

A rather similar story unfolded between the 1950s and the 1970s, with the 
concept of determinacy (the existence of a winning strategy) in place of the perfect 
set property. Gale and Stewart (1953) proved that open sets are determined, and 
determinacy for sets of Borel rank 2, 3, 4 was laboriously established over the 
next two decades. Then Martin (1975) proved that all Borel sets are determined. 
His proof was a tour de force, using the full resources of ZF+AC. In fact, Borel 
determinacy was known to be difficult before Martin proved it, because Friedman 
(1971) had shown that it is not provable in Zermelo set theory (which has AC 
but not the replacement schema). To this day, Borel determinacy is probably the 
best example of a theorem about the real numbers that depends on the replacement 
schema. 
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Nevertheless, Borel, Baire, and Lebesgue were right, in some sense, that the 
Borel sets are the sets most accessible to the human mind. They are the largest class 
of sets for which we can prove in ZF+AC all the “nice” properties: measurability, 
continuum hypothesis, and determinacy. In particular, determinacy is not provable in 
ZF+AC for the simplest natural extension of the Borel sets—the so-called analytic 
sets, which are the projections on R of Borel subsets of R?. 


Chapter 9 
Measure Theory 


PREVIEW 


In Sect. 1.7 we observed that any countable set has measure zero, because we can 
cover its first, second, third, ... points by intervals of lengths ¢/2, ¢/4, €/8,.... So 
the whole set can be covered by intervals of total length at most ¢, which can be as 
small as we please. 

This tells us that countable sets can be ignored in measure theory, but also that 
countable union is a useful operation for finding the measure of sets. In this chapter 
we will exploit countable unions of intervals to simultaneously define measure and 
to show that any measurable set can be approximated, within measure ¢, by a finite 
union of intervals. 

The concept of measure thus obtained is called Lebesgue measure, and it may 
be used to define a new concept of integral—the Lebesgue integral—that greatly 
extends the reach of classical calculus. Moreover, Lebesgue measure also clarifies 
the nature of the classical Riemann integral, by giving an exact description of the 
Riemann-integrable functions. 

The scope of Lebesgue measure is so wide that the nonmeasurable sets can be 
proved to exist only with the help of fairly strong forms of the axiom of choice. 
We give two examples. Because measurability of all sets of reals is incompatible 
with the full axiom of choice, it is not clear what set theory axioms are “ideal” for 
analysis. However, there are some interesting options, and in the Historical Remarks 
we discuss the set theory issues that they raise. 


9.1 Measure of Open Sets 


Since countable sets can be ignored in measure theory, it makes no difference 
whether we work with [0,1] (as contemplated in Sect. 1.7) or with the set of 
irrational numbers in [0,1]. The latter set can be identified with the set V = N% 
of sequences of natural numbers, as we saw in Sect. 5.6. In the present chapter we 
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will work with [0,1], because it and its subintervals are the simplest imaginable 
measurable sets, yet they suffice as a foundation for measuring highly complex sets, 
including all Borel sets. 

The theory of Borel subsets of NV, developed in Chap. 8, transfers easily to [0,1]. 
The open subsets of [0,1] are the unions of open intervals, which now include the 
half-open “end intervals” of the form [0,b) and (a, 1]. The Borel subsets of [0,1] 
are those subsets obtainable from open subsets by complementation and countable 
union. They are very similar to the Borel subsets of NV, from which they differ only 
by the possible presence of rational members. 

In this section we show that each open set U C€ [0,1] has a measure u(U) 
compatible with the usual length measure of closed intervals defined by 


H([a, b]) = b—a. 
We extend this measure py to other intervals (open or half-open) and to countable 


unions of intervals by the following two rules: 


Subtractivity If 7T CS, and the sets $,7 have measures p(S ), u(T) respectively, 
then u(S — T) = u(S) - WT). 

Countable Additivity. If S),52,53,... are disjoint sets with measures 
H(S 1), W(S 2), M(S3), ..., then 


MS; US2US3U--+) = u(S1) + WS 2) + WS3) +-°°. 


The extension of u proceeds as follows, from closed intervals to open sets, and it 
leads to an approximation property that will be the key to further extensions of yw. 


1. From the definition u([a, b]) = b — a it follows that 
H({a}) = u([a, a]) = a-—a=0. 


Thus, the measure of any singleton set is 0. 
2. It follows, assuming subtractivity for intervals, that 


u([a, b)) = w(la, b] — {b}) = p({a, b}) - u(la}) = b- a =0=b-a, 
and similarly u((a,b]) = u((a,b)) = b — a. Thus, open, closed, and half-open 
intervals with the same endpoints have the same measure. 

3. Assuming additivity for intervals, any finite disjoint union of intervals, with 


endpoints a;, b; such that 


a <b) Sa, <b2 S++: Sa, < by, 


has measure (b, — a,) + (bo — do) ++ ++ + (Dy — Gy) < 1. 
4. An open set U is the union of countably many intervals (namely, intervals with 
rational endpoints), hence of countably many disjoint intervals by merging any 
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that overlap. If the nth interval has endpoints a,,b, then, assuming countable 
additivity for intervals, 


MU) = (by — a1) + (b2 — az) + (b3 — 3) + + ° 


= lim[(bi — a1) + (b2 — a2) + +++ + (On - Gn)). 


Since (bj — a)) + (b2 — dz) + +++ + (bn — Gn) < 1 by the previous example, uw(U) 
exists as the limit of a bounded increasing sequence. (This example shows that 
the completeness of R is crucial to the theory of measure.) 

5. It also follows, since lim,_,..[(b1 — a1) + (b2 — do) +--+ + (by — dy)] exists, that 
(by — a1) + (bz — a2) + +++ + (by — Gy) is arbitrarily close to u(U) for n sufficiently 
large. That is, any open set can be approximated within measure & by a finite 
union of intervals. 


Exercises 


9.1.1 Prove by induction that Borel sets S C [0,1] differ from Borel subsets of N only by the 
presence of rational numbers. 

9.1.2. Show that any closed set F is contained in a finite union F’ of intervals that approximates 
F within measure e. 

9.2.3 Illustrate this result in the case of the Cantor set. 


9.2 Approximation and Measure 


The approximation property of open sets revealed in the last example—that any 
open set differs from a finite union of intervals by a union of intervals of total length 
< e—is generalized in the following: 


Definition. A set S C [0, 1] is said to be approximated within measure € by a finite 
union F of intervals if (S — F) U (F — S) can be covered by intervals of total length 
<eé. 


Thus, example 5 in the previous section shows that any open subset U of [0, 1] 
can be approximated within measure ¢ by a finite union of intervals. 

We now show that, for any ¢ > 0, each Borel set S € [0, 1] can be approximated 
within measure ¢ by a finite union of intervals. The argument involves summing 
series like the series 5 + { + § +--+ used at the beginning of this chapter. As usual, 
we argue by induction, proving that the property extends to 2, from the 2g with 
B < a via the operations of complementation and countable union. However, the 
ordinals are only incidental to the proof, so we will call this argument “induction on 
the construction of Borel sets.” 
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In the proof that follows we take the word “interval” to mean any kind of 
interval: open, closed, or half-open. As we have seen, this makes no difference as 
far as measure is concerned. And it has the advantage that complement, union, and 
intersection of finite unions of intervals are again finite unions of intervals. 


Borel approximation theorem. For any ¢ > 0, each Borel set S © [0,1] can be 
approximated within measure & by a finite union of intervals. 


Proof. We argue by induction on the construction of Borel sets, from open sets 
via complements and countable unions. The base step, where S is an open set, has 
already been done in example 5 of the previous section. 

For the induction step, first suppose that S is the complement of an approximable 
Borel set [0, 1] — S. That is, [0, 1] — S can be approximated within measure ¢€ by a 
finite union F of intervals. It follows immediately (thanks to the symmetry of the 
definition) that S is approximated within measure « by [0, 1] — F, which is also a 
finite union of intervals. 

Finally, suppose that S = S$; US2US3U--- , where each S; is an approximable 
Borel set. In particular, we have finite unions of intervals 


F,= LJ Tix, which approximates S; within measure €/4, 
k 


Fy = LJ Ihx, which approximates S 2 within measure €/8, 
k 


F3 = ks) 13x, which approximates $3 within measure ¢/16, 
k 


and so on. It follows that the union of all the intervals, 
F=| Jl 
ik 
approximates S$ within measure ¢/2, because 
E E€ € € 
—+o—t+— te Su, 
4 8 


Of course, F = Uj, Jj, may be an infinite union of intervals. But some finite 
subset of these intervals has a union F’ that approximates F’ within measure ¢/2 
(by the argument used for open sets in the previous section), so F’ approximates S 
within measure «. 

This completes the induction. oO 


Exercises 


9.2.1 Verify that complement, union, and intersection of finite unions of intervals are again finite 
unions of intervals. 
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The set (S — F) U (F — S) is called the symmetric difference of S and F, sometimes written 
SAF. The proof above takes it for granted that if each S ;4F; can be covered by intervals of total 
length at most ¢/2/*! then 


F € E 
U 5,4 U F; can be covered by intervals of total length at most 2 apt = 5° 
j j j 


A more nitpicking proof may include the following details. 


9.2.2. Observing that (Uj S;)-F = U,(S;-F) and S ;—F € S ;— Fj, deduce that U; S$ ;-Uj Fj € 
US; - F) 
9.2.3 Observing similarly that U; F; - UU; 5S; ¢ UF; - 59), 
deduce that U5; 4 U; Fj ¢ Uj(S jAF)). 


9.3 Lebesgue Measure 


The Borel approximation theorem prompts us to define measurability and measure 
of sets S C€ [0, 1] as follows. 


Definition. 1. A set S C [0, 1] is Lebesgue measurable if, for any ¢ > 0, there is a 
finite union F of intervals that approximates S within measure «. 

2. The Lebesgue measure «(S) of a Lebesgue measurable set S equals 
limpsoo H(F',), where F,, is a finite union of intervals that approximates S within 
measure 1 /n. 


Thus, Lebesgue measure extends the measure yu on intervals to all sets that can 
be approximated arbitrarily closely by finite unions of intervals. This definition 
of measure resembles the Greek “method of exhaustion” for measuring lengths 
and areas of curved figures—by approximating them with known figures, such as 
polygons. 

It follows immediately from this definition and the Borel approximation theorem 
that all Borel subsets of (0, 1] are Lebesgue measurable. 

However, the Borel sets are far from being all subsets of [0,1]. We saw in Sect. 8.6 
that there are only 28° Borel sets; that is, as many as there are points in [0,1]. And 
we know from Sect. 3.8 that [0,1] has more subsets than elements. To find more 
measurable sets we take subsets of the Cantor set C, which is equinumerous with 
[0,1] and of measure 0, by Sect. 3.7. It follows that C has as many subsets as [0,1], 
and all of them are measurable, as subsets of a measure 0 set. 

Thus, there are as many measurable sets as there are subsets of [0,1]. To decide 
whether nonmeasurable sets exist we therefore need to know more about Lebesgue 
measure than just the number of measurable sets. The properties of Lebesgue 
measure with the most bearing on this question are countable additivity, mentioned 
in Sect. 9.1, and translation invariance. The latter says that any measurable set S has 
the same measure as S +r = {x+r: x € S} (the “translate of S through distance r’). 
We now show why these properties hold for all measurable sets. 
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Properties of Lebesgue measure. Lebesgue measure pt on [0,1] is countably 
additive and translation invariant. That is: 


1. If S1,S2,53,... are disjoint measurable sets then 
MOS; US2US3U+++) = wS1) + a(S 2) + WS3) +e. 


2. If S is Lebesgue measurable and S$ +r = {x+r:xe€S} then u(S +7) = u(S). 


Proof. 1. As we showed in the proof of the approximation theorem, by approximat- 
ing each S,, within e/2"*! by a finite union of intervals J,, we can approximate 
S;US2US3U--- within ¢ by a finite union of intervals Jj. Thus, the measure 
M(S |US2US3U- --) can differ from the sum of measures pu(S ;)+u(S 2)+U(S 3)+: ++ 
by at most 2e. Letting ¢ — 0 we get 


MS; US2US3U--+) = (Si) + WS 2) + WS3) +°°>. 


2. It is immediate from the definition u([a, b]) = b— a that u([a+r,b+r])=b-a. 


Hence, if 1\,..., J; are intervals whose union approximates S within measure ¢, 
then J, + 7,...,J,4 + r are intervals whose union approximates S$ + r within e. 
Consequently, u(S + 7) = W(S). oO 


We have defined measurability only for subsets of [0,1] for the sake of conve- 
nience: it ensures that measurability is the same as “having finite measure.” We 
can also define measurability for subsets of R; for example, by saying that S is 
measurable if and only if each set SN [n,n + 1] is of finite measure. 


Exercises 


Without loss of generality we can take all the intervals whose finite unions approximate measurable 
sets S to be closed. Thus, the finite union F’, that approximates S within measure 1/n is a closed 
set. 


9.3.1 If we define lim,_,.. F,, to be {x : x € F,, for all sufficiently large n}, explain why lim,_,.. F;, 
differs from S by a set of measure 0. 

9.3.2 Show that lim, Fn = Um Onsm fn Hence, show that lim,-,.. Fy is a 2’ set. 

9.3.3 Deduce that any measurable set differs by measure 0 from a 2% set. 


Other closure properties of the measurable sets may be established as countable union was. 


9.3.4 Show closure under complement and that ({0, 1] — S) = 1 — w(S). Deduce closure under 
countable intersection. 


Approximation by finite unions of intervals implies the following “0-1 law.” 


9.3.5 Suppose that a set S C€ [0,1] has uniform density d in the sense that u(S NM 1)/uU) = d for 
each interval J. Show that d = 0 ord = 1. 
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9.4 Functions Continuous Almost Everywhere 


One of the most important insights afforded by Lebesgue measure is that it is 
more important for properties to hold “almost everywhere” (i.e., everywhere but 
for a set of zero Lebesgue measure) than everywhere. For example, we have the 
theorem that any continuous function on a closed interval is Riemann integrable 
(Sect. 4.8). However, this does not completely characterize Riemann integrability 
because certain discontinuous functions are also Riemann integrable. Some of them 
look extremely discontinuous from a nineteenth-century point of view; for example, 
the Thomae function on [0,1]: 


1 : m : 

-— if x= % for integers m,n 
Gant Ae 
QO if x is irrational. 


As we saw in the exercises to Sect. 4.2, f(x) is discontinuous at every rational point, 
so it has a dense set of discontinuities. However, f(x) is continuous at all irrational 
points, which we now realize is almost everywhere. And, as we also saw in the 
exercises to Sect. 4.8, f(x) is Riemann integrable. 

This is no accident. The bounded Riemann integrable functions on [a,b] 
are precisely those that are continuous almost everywhere. Thus, by relaxing 
“continuous” to “continuous almost everywhere” (together with the more 
trivial boundedness condition) we can exactly capture the concept of Riemann 
integrability. 

To prepare for a proof of this characterization of Riemann integrability, in this 
section we explore the properties of almost everywhere continuous functions. This 
involves a loosening of the concept of continuity called a-continuity. 


Definition. The function f is a-continuous at x = c if there is a 6 > O such that 


y,z2€ (c—6,c+6) > |fly) — fF < a. 


Thus, there is a neighborhood of x = c in which f(x) varies by less than a. A few 
results follow immediately from this definition. 


1. If f is continuous at x = c, then for every natural number n, there is a 
neighborhood of x = c in which f(x) varies by less than 1/n. In other words, 
for every n, f is 1/n-continuous at x = c. 

2. Therefore, if f is not continuous at x = c, then for some n, f is not 1/n- 
continuous at x = c. 

3. So, if we let 


D={xe€R: f is not continuous at x}, 


Dy = {x €R: f is not a@-continuous at x}, 
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then 


D= UJ Dim 
n=1 


4. Also, each set Dy is closed, because it contains all its limit points. Namely, if 
C1,€2,€3,... € Dg, then f(x) varies by at least a in each neighborhood of each 
c;. Consequently, if c is the limit of the c;, then each neighborhood of c contains 
points c;, and hence neighborhoods in which f(x) varies by at least a. Thus, 
c € Dy also. 


Note that the set D of points of discontinuity may not be closed. The Thomae 
function, for example, is discontinuous at precisely the rational points. The sets Da 
are easier to work with in this respect, and in the next section we will take advantage 
of the fact that D is a countable union of the closed sets Dj /n. 


9.4.1 Uniform a-Continuity 


Another ingredient we will need in the next section is an analogue of the theorem 
from Sect. 4.7—that continuity on a compact set implies uniform continuity. The 
analogue is that a-continuity on a compact set K implies uniform @-continuity on 
K, defined as follows. 


Definition. A function f is uniformly a-continuous on the set S if there is a 6 such 
that, for any y,z eS, 


ly—2 <6 => |fY)—-f@) <a. 


The theorem then reads: 


Compactness and uniform a-continuity. An a-continuous function on a compact 
set is uniformly a-continuous. Oo 


The proof is completely analogous to the proof for ordinary continuity. 


Exercises 


9.4.1 Find Dj, for the Thomae function #(x). 
9.4.2 Hence show that each Dj/, is finite for f(x). 
9.4.3 Use D = UR, D1/n to give a new proof that the rationals in [0,1] form a countable set. 


An important class of functions that are continuous almost everywhere are the monotonic 
functions. 


9.4.4 If M(x) is monotonic, prove that D,,,, has measure 0, hence that D has measure 0. 
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Now we are ready to prove Lebesgue’s theorem that a bounded function f on [a, b] 
is Riemann integrable © f is continuous almost everywhere. We use the concept of 
a-continuity and the discontinuity sets D, and D from the previous section. Thus, 
to say that f is continuous almost everywhere is to say that u(D) = 0 and, since 
D =U, Din, the latter condition is equivalent to (D1/,) = 0 for all n. 

Without loss of generality we can replace [a, b] by [0, 1], because translating and 
rescaling the domain of a function does not affect its Riemann integrability. We also 
split the theorem into its two directions. The harder direction is: 


Almost continuity implies Riemann integrability. [f f is bounded on [0, 1] and 
LD) = 0, then f is Riemann integrable. 


Proof. Suppose we have the bound M > |f(x)| for f on [0,1]. We set 
a@=e/2 


and first show that there are disjoint open intervals , /o,..., J, of total length less 
than ¢/4M such that 


Dg CT Ubv::- Uk. 


This is because Dy is compact and of measure 0. Since Dg is of measure 0, we can 
cover it by an open set (i.e., a union of open intervals /) of arbitrarily small measure, 
namely, the finite union F of intervals that approximate D, within measure e, plus 
the union of intervals of total length < ¢ that cover Dy — F’. And since Dy is compact, 
finitely many of the intervals J also cover Dy, and the union of these is a disjoint 
union of certain open intervals J), b,..., 1. By choosing the open set covering Da 
to have measure less than ¢€/4M we then have 


Ww UbU::-Ul) < €/4M. 


Now let K = [0,1]— (Ud, UL U--- Uk). Since all points of K are outside the set 
Dz, where f is not a-continuous, f is a-continuous on K. Also, K is closed (being 
the complement of the open set J; U J, U--- U J;) and bounded, hence compact, so 
f is in fact uniformly a-continuous on K, by the theorem in the previous section. 

Recall that uniform @-continuity means that there is a 6 > O such that, for any 
y,ze K 


ly-2<6> |fyw-f@)|<e@=e/2. 


Thus, if we divide K into finitely many subintervals of length less than 6, the 
difference between the lub and glb of f(x) in each is less than ¢/2. Consequently, 
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the difference between the upper and lower Riemann sums for f on K is also less 
than € (since the length of K is at most 1). 

Finally, consider upper and lower Riemann sums on J; U Jy U--- U Jx. Since this 
set has length at most ¢/4M and |f(x)| < M, the difference between lub and glb of 
fonl, UhU-:-Uk is at most 2M, hence the difference between upper and lower 
Riemann sums is at most 


2M -e/4M = €/2. 


Adding the contributions from K and J, UJ, U---U ], we find that the difference 
between upper and lower Riemann sums for f over [0,1] is at most ¢. Since ¢€ is 
arbitrary, this means f is Riemann integrable. im 


The easier direction is: 


Riemann integrability implies almost continuity. [f f is Riemann integrable on 
[0, 1], then the set D of discontinuities of f has measure 0. 


Proof. Since D = U-., Di jn, it suffices to show that each Dg has measure 0. Given 
a and any € > 0 we choose a partition of [0,1] for which the difference between 
upper and lower Riemann sums for / is less than ce. 

It follows that the intervals of the partition on which f varies by a or more have 
total length less than ¢. These intervals cover the set Dy, so Dg is contained in a 
finite union of intervals with arbitrarily small total length. Thus, D, has measure 0, 
as required. oO 


Exercises 


If f is continuous almost everywhere on [a, b] then the same is true on any subinterval of the form 
[a, x], hence the Riemann integral 


FQ) = { " f(0 dt 


exists for any x in [a,b]. We now investigate the extent to which the fundamental theorem of 
calculus holds for the almost continuous functions f. (We showed that it holds at all points x for 
continuous f in Sect. 4.8.) 


9.5.1 Show that F’(x) = limyo ["”" f(O dt if the limit exists. 
9.5.2 Show that the limit equals f(x) almost everywhere. 


9.6 Vitali’s Nonmeasurable Set 


The first example of a nonmeasurable set was discovered by Vitali (1905). Its 
existence depends on a fairly strong form of AC—a well-ordering of R—but, in fact, 
all examples of nonmeasurable sets depend on rather strong forms of AC. We give 
another interesting example in the next section. 
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The Vitali nonmeasurable set. Assuming that there is a well-ordering of R, there 
is a set V C [0, 1] that is not Lebesgue measurable. 


Proof. The set V is most naturally viewed as a subset of the circle (of circumfer- 
ence 1), rather than as a subset of [0,1]. For this reason, we imagine [0,1] turned 
into a circle C by joining 0 to 1, so each x € [0, 1) is associated with the point on 
the circle C at angle 27x. 

In particular, each rational gq € [0, 1) corresponds to the point at angle 27g. We 
now call points x; and x2 equivalent if x; — x2 is rational. This equivalence relation 
partitions the circle C into equivalence classes, where 


X1,X2 belong to the same class © x; — x is rational. 


For example, the rational points in [0, 1) form one such class. If x; — x2 is not rational 
then x; and x2 belong to distinct classes, with no common point x (otherwise x; — x 
and x2 — x would both be rational, in which case x; — x2 would be rational too). 

It follows from AC for subsets of R (and hence also from a well-ordering of R) 
that there is a set V that includes exactly one member from each equivalence class E. 
In particular, V includes exactly one rational point. Now, for each rational g we let 


Vt+q={x+q:xeEV}. 


Here x+q denotes addition “on the circle,’ or mod 1, so that x +g corresponds to the 
point on the circle obtained from the point at angle 27x by rotating through angle 
2nq. Thus, V + q is simply the result of rotating the set V through angle 27q. 

We now observe that the sets V + q, for rational g € [0, 1), have the following 
properties. 


1. The sets V + qy and V + q2 are disjoint if q, # q2. 

Ifxe V+q, and x € V + q then x — q},x-— q2 € V, so x — qi and x — q2 are 
either identical or inequivalent, because distinct members of V belong to distinct 
equivalence classes. It follows that gq; and q2 are identical or inequivalent. But q; 
and q2 cannot be inequivalent, since their difference is rational, so we must have 
f= 2. 

2. The union of the sets V + q, over rational q € [0, 1), is the whole circle C. 

Any x € [0, 1) is equivalent to some x’ € V, since V includes a member of 
each equivalence class. But then x = x’ + q for some rational q, in which case 
xeV+g. 

3. If V is Lebesgue measurable, then so is each V + q, and n(V + q) = MV). 

Because V + q is a translate of V, and yp is translation invariant. 


Now suppose that V is Lebesgue measurable, in which case we have either 
HV) = 0orW(V) =e> 0. 


204 9 Measure Theory 


If u(V) = 0, then the union of the sets V + q, for rational g, is a countable union 
of measure 0 sets by property 3, and hence also of measure 0. This is impossible, 
by property 2, since C has measure 1. If 4(V) = € > 0 then C is a countable disjoint 
union (by property 1) of sets of measure ¢, so C has infinite measure, which is also 
impossible. 

Thus, the necessary conclusion is that V is not Lebesgue measurable. oO 


Exercises 


A somewhat similar nonmeasurable set, again in the circle of circumference 1, may be obtained 
from an irrational rotation in place of the rational rotations in Vitali’s example. 

We let s be an irrational number, so adding s mod | amounts to rotation through 27s. Now 
prove the following. 


9.6.1 For any point x, the points in 
orbit of x = {x,x+ s,x+2s,...} 


are all different. 
9.6.2. For any points x and y, the orbits of x and y are disjoint or identical. 
9.6.3 If X (obtained by AC) includes exactly one point from each orbit, show that 


are disjoint sets that fill the circle. 
9.6.4 Conclude, as in Vitali’s example, that X is nonmeasurable. 


9.7 Ultrafilters and Nonmeasurable Sets 


A new kind of nonmeasurable set was introduced by Sierpinski (1938), based on 
the existence of nonprincipal ultrafilters, which had been proved by Tarski (1930). 
Sierpinski used an ultrafilter to construct a set W C [0,1] with the following two 
properties. 


1. For eachx € [0,1], xe€Wol-xé€W. 
Thus, in some sense, W includes half the points in [0, 1], so we should have 
LW) = 5, if W is measurable at all. 
2. If [0, 1] is divided into 2” equal intervals I,, then each W 2 I; is a translate of 
Woh, sownwonak) =wwoah). 
Thus, we should also have w(WNO I,) = $I). This conflicts with the 
definition of measurable sets in Sect. 9.3, according to which they can be 
approximated within measure ¢€ by finite unions of intervals. 


The proof associates each set X € U with a number x € [0, 1], namely, 
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x= as 


nex 


However, in a countable number of cases, two sets X are associated with the same 
x. For example, 


1 
Sao SDP ADOT ties, 


so the sets {1} and {2,3,4,...} both give x = 5. This situation occurs only for 
numbers x expressible as a finite sum of powers 2” (equivalently, numbers with 
a finite binary expansion). Since there are only countably many such x, the set of 
them has measure 0 and may be ignored. We do so in the proof below, where [0, 1]* 
denotes the interval [0,1] minus the points with finite binary expansions. 

The omitted points x may also be described as those with binary expansions 
that terminate in an infinite sequence of Is. Thus, they correspond precisely to the 
cofinite sets X, which form the filter we extend to get U. The points x € [0, 1]* that 
we consider are therefore those corresponding to the X in U that are added to the 
cofinite filter to make an ultrafilter. 


Ultrafilter-based nonmeasurable set. /f U is an ultrafilter over N extending the 
cofinite ultrafilter, and if x = Yinex 2", then the x € [0, 1]* for X € U comprise a 
nonmeasurable set W. 


Proof. Since [0, 1]* omits all numbers that are finite sums of terms 2~”, each x « W 
includes infinitely many terms. Each x € W also omits infinitely many terms 27”, 
since any x that omits only finitely many terms 2~” can be rewritten as a finite sum 
of such terms. 

Now cx 2” = 1, so it follows for each x € W that 


l-x=1-))2"= 27 ¢W, 


nex n¢€X 


since U is an ultrafilter and hence X € U © N-X ¢ U. Thus, 
xEeEWel-xeE[0,1l]*-W, 


which means that [0, 1]* — W is the reflection of W in x = 5. So both sets have the 
same measure, if they are measurable at all. 

We now prove the second property that follows from the measurability of W: for 
any subdivision of [0, 1] into 2” equal subintervals I, WO I, has the same measure 
for each k. In fact, we show that each W/O J; is a translate of WO I. This follows 
by induction when we prove that each WO Ik4, is the translate of WN I, through 
distance 2™™, which amounts to proving 


xEWoxt2" ew. (*) 
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To prove the latter claim, we go back to the definition of U, in order to show that 
Xe€Uex'€U forany sets X, X’ that differ by a finite set. 
Certainly 
Xe€US>XUFE€EU forany finite set F, 
because any filter is closed under supersets. But also 
Xe€U>X-Fe€EU for any finite set F, 
because 
X-F=XN(N-F)=XN (cofinite set) € U, 


since U includes all cofinite sets and is closed under intersections. 
The consequent property of the numbers ~ is that 


xEeWorxr ew 


for any x, x’ whose expressions as )) 2~” differ in finitely many terms. Thus, to prove 
(*) it suffices to prove that x + 2~” differs from x in only finitely many terms. This 
is easy to check (see exercises). 

Now we are ready to decide whether W is measurable. If it is, we know that 
LW) = 5. In that case, since u(W /M J;,) is the same for each k, we have u(W Nk) = 
5 u(x). This contradicts the definition of Sect. 9.3, that any measurable set can be 
approximated by a finite union of intervals. (In more detail: if W is measurable, 
approximate it within ¢/2 by a finite union of intervals 7. Then, by choosing m 
sufficiently large, approximate the union of the intervals 7 within ¢/2 by a suitable 
union of intervals among the 2” intervals J. This contradicts the second property of 
W, according to which only half the measure of the intervals J, belongstoW.) O 


Exercises 


Given x = 27 +27 +273 +--+ with ny < ny < m3 <---, we wish to show that x’ = x +27 
is a sum of distinct powers 2~”, differing from the sum for x in only finitely many terms. This is 
obviously true for x’ = x +27” when x does not include the term 2~”, and for x’ = x — 2~” when 
x does include the term 2~”. We now consider the remaining cases. 


9.8.1 Suppose that x = 27" +277 +278 +---, with n; <n <n <---, includes the term 27”. 
Explain why x’ = x + 2” can be written as a sum of distinct powers 2~” that differs from 
the sum for x only in terms 2~” withn < m. 

9.8.2. Explain why 2“! — 2+ = 2-7 +273 42-4. 


9.8 Historical Remarks 207 


9.8.3 If n < m, show that 
27 97m = gm) 4 2-42) devey ae OOM 


9.8.4 Suppose that x = 2° +277 +27 +--+ withn, <n <n3 <--- and that x does not include 
the term 2~””. Show, using Exercise 9.8.3 or otherwise, that x’ = x — 27” can be written as a 
sum of distinct powers 2~” that differs from the sum for x in only the terms with n < m. 


The proof above—that W is not measurable—assumes “reflection invariance” of Lebesgue 
measure in claiming that W has the same measure as [0, 1]* — W. 


9.8.5 Formulate “reflection invariance” precisely, and explain why it holds for the Lebesgue 
measure. 


9.8 Historical Remarks 


The two founders of measure theory, as we know it today, were Borel and Lebesgue. 
As mentioned in Sect. 7.9, Borel (1898) saw that it is natural to expect countable 
additivity for any concept of measure, and that this implies the measurability 
of all Borel sets. He also emphasized the immediate consequence of countable 
additivity, that countable sets have measure 0; in other words, that all sets of positive 
measure are uncountable. Finally, he was aware of the potential inconsistency of the 
measure concept—in the sense that different constructions of the same set could 
give different values of its measure—and he saw that inconsistency is averted by 
the Heine—Borel theorem. An interval of positive measure, such as [0,1], cannot be 
covered by intervals [,, of total measure ¢, since the Heine—Borel theorem then gives 
a covering of [0,1] by finitely many I, of total measure at most ¢, which is clearly 
impossible. 

This is why Borel called the Heine—Borel theorem the “first fundamental theorem 
of measure theory.” Borel (1950) also spoke of the “second fundamental theorem of 
measure theory,” meaning the fact that every measurable set can be approximated 
within ¢ by a finite union of intervals. Lebesgue (1902) took this property as the 
definition of a measurable set in [0,1], much as we did in Sect. 9.3, and used it to 
prove the fundamental properties of measure, such as countable additivity. 

Lebesgue extended Borel’s ideas on measure in two ways. First, he extended the 
concept of measure beyond the Borel sets by including all subsets of measure 0 sets 
among the measure 0 sets. As we now know, this extends measure to all subsets of 
[0,1], except for sets called into being by AC. Second, Lebesgue applied the concept 
of measure to analysis, particularly to the theory of integration. Lebesgue’s concept 
of integral, now called the Lebesgue integral, is an extension of the Riemann integral 
to a much larger class of functions. Just as Lebesgue measure covers all Borel sets 
plus sets that are “almost Borel,” the Lebesgue integral covers all bounded Baire 
functions plus such functions that are “almost Baire.” This includes the Dirichlet 
function, and many other functions that are not Riemann integrable. Moreover, the 
Lebesgue integral has various limit properties that fail for the Riemann integral. 
Among them are: 
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Monotone Convergence Theorem. If fi < fo < fs < -:: is a sequence of 
Lebesgue integrable functions that converge almost everywhere on [a, b] to f, 


and if {’ f,(x) dx < A for each n, then 


b b 
F(x) dx = lim { Tnx) dx. 


a 


Dominated Convergence Theorem. If fi, fo, f3,... are Lebesgue integrable 
functions that converge almost everywhere on [a,b] to f, and if there is a 
Lebesgue integrable g with |f,| < g almost everywhere in [a,b], then f is 
Lebesgue integrable and 


b b 
i f(x) dx = tim f Tnx) dx. 


Both of these theorems fail for the Riemann integral if we take 


1 on the first 1 rational points x 
0 elsewhere, 


Si) = { 


$0 liMy—oo f, 18 the Dirichlet function, which is not Riemann integrable, even though 
each Riemann integral f. . Tr(x) dx = 0. 

Lebesgue’s concept of “almost everywhere” corrects many cases of bad behavior 
previously thought to be incorrigible. We have already seen the example of the 
Thomae function, which looks badly discontinuous, but is actually continuous 
almost everywhere. Other cases of badness shown to be “good almost everywhere” 
by Lebesgue concern the differentiability of continuous functions and the funda- 
mental theorem of calculus: 


e Any monotonic continuous function is differentiable almost everywhere. 
¢ If fis integrable and F(x) = ik i {(@ dt then F is differentiable almost everywhere 
and F’(x) = f(x) almost everywhere. 


Lebesgue’s results changed the face of analysis by greatly expanding the scope 
of the operations of integration and differentiation. They also drew attention to the 
underlying concept of measure, and raised the question of nonmeasurable sets. The 
example of Vitali (1905) was challenging in its simplicity, despite its reliance on 
AC, which Lebesgue did not accept. More challenging examples were to come. 

Hausdorff (1914), p. 469, gave a decomposition of the sphere into three pieces 
A, B, and C such that: 


e A is congruent to B (by a rotation of the sphere), 
e A is congruent to C (by a rotation of the sphere), 
e A is congruent to BU C (by a rotation of the sphere). 
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Fig. 9.1 Giuseppe Vitali 


It follows that subsets of the sphere cannot even be given a finitely additive measure 
Lu, if measure is assumed to be rotation-invariant.! If they could, we should have 
nonzero numbers satisfying the contradictory equations 


H(A) = WCB) = W(C) = p(B) + WC). 


Hausdorff’s sets A, B, and C are determined with the help of AC. Also using AC, 
Banach and Tarski (1924) used Hausdorff’s idea to devise the ultimate affront to 
common sense: a decomposition of the three-dimensional unit ball into a finite 
number of subsets, which can be rigidly moved to form two unit balls. Accounts of 
this “Banach-Tarski paradox” may be found in Wagon (1993) and Wapner (2005). 
Figure 9.2 shows Banach and Tarski around the year 1919, five years before the 
Banach-Tarski theorem. 

As mentioned in Chap. 6, Gédel (1938) proved that AC is consistent with the ZF 
axioms, so the Banach-Tarski paradox is not a contradiction (unless the ZF axioms 
are themselves contradictory). Nevertheless, one might hope that there are other 
axioms with some of the benefits of AC without its counterintuitive consequences. 
The most deeply studied candidate so far is the axiom of determinacy, AD, which 
we introduced in Chap. 7. 

There we mentioned that AD has some benefits for the theory of R, because 
it implies that all subsets of R are Lebesgue measurable and that countable AC 
holds for sets of reals. What distinguishes AD from AC is its “higher consistency 
strength,’ meaning that we have to assume the existence of very large sets to prove 
the consistency of ZF+AD. Since AC is normally assumed in these consistency 


'More generally, we would like measure to be invariant under any rigid motions, such as 
translations, reflections, and rotations. This is the case for Lebesgue measure in all dimensions. 
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Fig. 9.2 Stefan Banach and Alfred Tarski 


proofs, one can measure large sets by their cardinal numbers, which means we 
assume the existence of large cardinals. 

Recall from Sect. 7.9 that the Gddel (1938) theorem about AC is that ZF+AC 
is consistent provided only that ZF is consistent. This means that ZF+AC has the 
same consistency strength as ZF. A proposition of somewhat higher consistency 
strength is “all subsets of R are Lebesgue measurable.” As mentioned in Sect. 6.8, 
to prove the consistency of this proposition one needs to assume the existence of an 
inaccessible cardinal—something not provable in ZF alone. Inaccessible cardinals 
are merely the smallest of the large sets whose existence cannot be proved in ZF. 
We have to assume the existence of much larger sets, called Woodin cardinals, to 
prove the consistency of ZF+AD (again, assuming that ZF is consistent). This result 
follows from the theorem of Woodin, mentioned in Sect. 7.9, that AD holds in L(R). 

Another way to deal with the paradoxical sets is to allow them to exist (by 
assuming AC), but to keep them as far as possible from sets we can define, such as 
the Borel sets and those obtained from them by projection and complementation: 
the projective sets. Woodin cardinals also solve this problem. If we assume the 
consistency of 


ZF+AC+ ‘there are infinitely many Woodin cardinals,” 
then we get the consistency of 
ZF+AC+“all projective sets are determined.” 


This was proved by Martin and Steel (1989). As mentioned in Sect. 7.9, Mycielski 
and Swierczkowski (1964) proved that all determined sets are measurable, so Martin 
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and Steel’s result keeps the nonmeasurable sets outside the projective sets, provided 
those infinitely many Woodin cardinals exist. 

It is perhaps frustrating to be unable to answer basic questions about the real 
numbers without assuming the existence of astoundingly large sets. But it is surely 
inspiring to know that the human mind can find such a remarkable explanation for 
the gaps in our understanding of R. 


Chapter 10 
Reflections 


PREVIEW 


In this chapter we revisit the fundamental questions raised in the first chapter. We 
review the answers obtained, and reflect on the insights and new questions to which 
they lead. 

The fundamental questions were already implicit in ancient Greek mathematics, 
and the Greeks saw that their difficulties were entangled with the concept of 
infinity—which worried them. On the other hand, they also saw that infinity could 
be used to solve otherwise unapproachable problems, such as finding the area of 
a parabolic segment. But their qualms about infinity prevented them from using 
infinite processes systematically (1.e., from developing calculus). 

We now realize that the difficulties of Greek mathematics are concentrated in the 
concept of a real number: a concept that has to meet the needs of both arithmetic 
(counting, adding, multiplying) and geometry (measuring quantities such as length 
and area, modeling continua such as lines and curves). 

To reconcile these demands requires acceptance of infinity, and with it the general 
concepts of set, function, and limit—none of which were known to the Greeks. 
Set theory is a setting where questions about infinity, functions, and limits can be 
answered to a large extent. In particular, ZF set theory (plus certain axioms of 
choice) has given good answers to most of the fundamental questions. But it has 
also shown the questions to be more complicated than first thought. In particular, 
the problem of measuring sets of real numbers is entangled with questions about the 
entire universe of infinite sets. 
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10.1. What Are Numbers? 


The answer to this question turns out to have two parts: 


1. The laws of arithmetic, which govern the behavior of addition, subtraction, 
multiplication, and division, arise from the positive integers 1,2,3,4,5,..., and 
the accompanying principle of induction. These laws extend quite easily to the 
integers ... — 2,—1,0,1,2,3,...and the rational numbers m/n (where m and n 
are integers and n # 0). 

2. Since irrational numbers exist—for example, V2 and m—we need to define 
irrational numbers and to extend the laws of arithmetic to them. A convenient 
way to do this is to define each positive real number as a cut in the set of 
positive rational numbers; that is, a partition of the set Q* of rational numbers 
> 0 into sets L and U, where each member of L is less than each member 
of U. 

Each rational number r is thereby represented by the cut (LZ, U), where 


L={qeQiqsr}, U={qeQV:g>n, 
or by the cut (L’, U’), where 
L’={qeQ:q<r, U={qeV:qe2r. 
A cut (LZ, U) in which L has no maximum member and U has no minimum mem- 


ber therefore represents an irrational number. For example, V2 is represented by 
the cut 


Lye=qeVig <2, UgaqeV:g > 2}. 
Thus, the rational numbers are reinvented, and the irrational numbers are brought 
into being, as (pairs of) infinite sets of positive rational numbers. The laws of 
arithmetic are then inherited, as it were, from the rational numbers. It suffices to 
define the sum of cuts (L, U) and (L’, U’) as the cut with lower set 
L+L'={qt+q':qéLandd €L'}, 
and the product of cuts is the cut with lower set 


LL’ = {qq':q€ Landd’ € L'’}. 


The laws of arithmetic for positive real numbers extend to zero and negative real 
numbers without difficulty. One can also prove properties of algebraic numbers, 


such as (V2)? = 2 and V2 V3 = V6. 
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The step from the rational to the real numbers is accomplished by a single axiom of 
ZF set theory: the axiom of infinity. More precisely, the theory of rational numbers 
under addition and multiplication is essentially the same as ZF—Infinity, the theory 
of finite sets. As we explained in Sect. 6.6, the natural numbers can be taken to be 
the finite sets 


O={}, 1={0}, 2= {0,1}, 


The successor function is then S(n) = n U {n}, and + and x can be defined by 
induction as explained in Sect. 2.2. This embeds the arithmetic of natural numbers 
in the theory of finite sets, and one can extend the theory to integers and rational 
numbers by using ordered pairs, as was also explained in Sect. 6.6. Conversely, all 
finite sets can be encoded as natural numbers, and operations on finite sets, such as 
pairing and union, can be simulated by arithmetic operations. 

The axiom of infinity takes us from Q to R and of course much more. ZF 
set theory includes objects far beyond individual real numbers, or the set R, or 
the subsets of R, or the real functions that form the subject matter of analysis. 
Nevertheless, in passing from ZF-—Infinity to ZF we do not necessarily overstep 
the bounds of analysis. If anything, we need more axioms of set theory, not less, to 
answer questions about R. As we have seen, some form of choice axiom is needed 
to establish the equivalence of ordinary continuity and sequential continuity. And 
some questions about measure and determinacy of subsets of R cannot be settled 
without appealing to large cardinal axioms, that is, assumptions about the size of 
sets that are able to exist. 


10.2. What Is the Line? 


While Dedekind’s concept of cut provides an immediate extension of the laws of 
arithmetic from rational to irrational numbers, that was not his main reason for 
introducing it. What he really wanted to do was provide a numerical model of 
the line, and particularly its “continuity,” or what we now call its completeness. 
The rational numbers do not provide a convincing model of the line because they 
have gaps at places such as V2. Gaps in the line stymie any attempt to prove basic 
theorems about continuous functions, such as the intermediate value theorem. 

For example, the function f(x) = x? — 2 continuously passes from a negative 
value (—2) at x = 0 to a positive value (+2) at x = 2, without taking the value 0 at 
any rational value of x. Thus, even a quadratic function on Q may fail to have the 
intermediate value property. 

Dedekind’s way of creating a line without gaps is almost absurdly simple: fill 
each gap in the rational numbers by the gap itself! And what better way to realize 
a gap in the rationals than as the pair (L, U) of sets of rationals, respectively, to 
the left, and to the right, of the gap? What made Dedekind’s idea so revolutionary 
was its introduction of sets as bona fide mathematical objects. This was too radical 
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for Dedekind’s contemporaries, and perhaps even for Dedekind himself, because he 
claimed the right to create a new number corresponding to each pair (L, U). Today 
however, we are used to defining mathematical objects as sets, so a pair (L, U) is as 
good a mathematical object as any other. 

Realization of gaps by sets of rational numbers may be the simplest way to 
construct a line without gaps, but the set R of real numbers is not as simple as 
the set Q of rational numbers. As we saw in Chap. 3, Q is countable, like the set 
of natural numbers, but R is not. This makes a sharp distinction between rational 
and real numbers, which was not clear when we casually made the step from 
rational numbers to sets of rational numbers. The uncountability of R exposes the 
enormity of this step, and more generally of the step from any set S$ to its power set 
P(S) = {all subsets of 5}. As we saw in Sect. 3.8, this step always leads to a set of 
higher cardinality, so the power set axiom of ZF—which allows us to view sets as 
objects, and hence as members of another set—is not to be taken lightly. 


10.3. What Is Geometry? 


Since ancient times, geometry has been based on the idea of a continuous space 
(originally the line, the plane, or three-dimensional space) with a distance function. 
Distance is the fundamental geometric quantity, since it determines all other 
geometric quantities, such as angle, area, and volume. We now know that the idea 
of a continuous line can be modeled by R, the set of real numbers, and R also gives 
a distance between any two points x and y in R; namely, |y — x]. 

The plane can then be modeled by the cartesian product of the line with itself, 


R* = {(x1, x2) : x1, 2 € R}, 


which goes back to the idea (as you can see from the word “cartesian”) of Descartes 
(1637) of using coordinates x,, x2 to describe points in the plane. As we saw in 
Sect.5.1, the Pythagorean theorem motivates the definition of distance between 
points (x, x2) and (yj, y2), namely, 


Ju — x1)? + (y2 — x2). 


More generally, we define the distance between points (x1,2%2,...,%,) and 
(1, Y2,-++>Yn) in R" to be 


af ya — x1)? + (Yo — x2)? +++ + Yn — n)?, 


motivated by the idea of iterating the Pythagorean theorem (using a triangle with 
one side in R’"! and a perpendicular side into R”). The generalization of distance to 
n dimensions is elegantly subsumed by the concept of inner product on R": 
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(X14 X25 0665 Xn) (YI, Y25 0-5 Yn) = XY + XQY2 + +++ + XnYn- 


We abbreviate the points (x1, x2,...,%,) and (y1, y2,..-,Yn) by x and y, the distance 
between x and y by |y — x|, and we further abbreviate the distance |x — 0| between x 
and the origin 0 by |x]. 

Then we notice that |x|* = x - x and, more generally, 


ly- xl? =(y-x)-(y-x). 


Thus, distance is expressible in terms of the inner product. So too is angle, because 
it happens that 


x+y = |x\ly| cos 6, 


where @ is the angle between the lines from the origin to x and y. The inner 
product thereby gives the basic concepts of geometry on R”. In this geometry 
the Pythagorean theorem holds—because it is built into the definition of inner 
product—and also the other fundamentals of geometry laid down by Euclid. For 
this reason, R” with its inner product is called Euclidean space. 

It is noteworthy how much of this realization (no pun intended) of Euclid’s vision 
builds on ideas already present in Euclid. The Pythagorean theorem is the main 
theorem of Euclid’s Elements, Book I, and the struggle to incorporate irrationals into 
the line is the subject of Euclid’s Book V. What Euclid lacked, as we have seen, was 
sufficient acceptance of infinity to obtain a complete number line R, and sufficient 
acceptance of algebra to admit products of any number of lengths. [Incidentally, it 
was Grassmann (1847) who first proposed algebraic foundations for n-dimensional 
geometry. Thus, Grassmann was a pioneer in the foundations of both arithmetic and 
geometry. ] 

The linear algebra courses of today, which often present the definition of 
Euclidean space without comment, stand upon the shoulders of giants: Euclid, 
Descartes, Grassmann, Dedekind, Cantor, .... And this is just Euclidean geometry. 
Since the early nineteenth century, there have also been non-Euclidean geometries, 
typically based on R as well. 

Historically, Euclidean geometry was first generalized by considering surfaces in 
R?, and measuring distance between two points on the surface in the natural way. 
For example, if one has the unit sphere S* in R* one wants to measure the distance 
between points P and Q on S? by taking the plane through P, Q and the center O of 
S*, which meets S? in a circle of radius 1, and measuring the length of the arc from 
P to Q on this circle. 

More generally, on a smooth surface S in R* one hopes that for any two points P 
and Q on S there will be a geodesic (curve of shortest length) connecting P and Q, 
the length of which can be found by calculus. 

More generally still, one can dispense with the ambient space R* or R” 
altogether and simply define a length function d(P, Q) on any space S to be a real- 
valued function with reasonable properties. The most general properties that are 
geometrically reasonable are the following: for all P,Q € S 
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1. d(P, Q) = 0 (distance is positive), 

2. d(P,Q) = 0 © P = Q (points at zero distance are identical), 
3. d(P, Q) = d(Q, P) (symmetry), and 

4. d(P,R) < d(P, Q) + d(Q, R) (the triangle inequality). 


These properties define what is called a metric space, which is a general setting for 
geometry based on a real-valued length function. 


10.4 What Are Functions? 


The simplest answer is based on the concept of a set, as we already saw in Sect. 1.5. 
However, we now know that the simplicity of this general definition has a cost. Real 
functions become as complicated as sets of real numbers, so their properties depend 
to some extent on which axioms of set theory we accept. 

For example, with a well-ordering of R we have a nonmeasurable set V € [0, 1] 
(the Vitali set), and the characteristic function of V is not Lebesgue integrable. If, 
on the other hand, we accept the axiom of determinacy (AD), then all subsets of 
[0,1] are Lebesgue measurable and all (bounded) functions on [0,1] are Lebesgue 
integrable. 

There are also options between these two extremes, concerning the complexity 
possible for nonmeasurable functions. To explain them we recall the concepts of 
Baire function and Borel set from Chap. 8. 

The Baire functions are those obtainable from continuous functions by the limit 
operation, and they include the characteristic functions of all sets in the Borel 
hierarchy. The Borel sets are all Lebesgue measurable, as we saw in Sect. 9.3, 
and this enables us to prove that all (bounded) Baire functions on [0,1] are 
Lebesgue integrable. Thus, nonintegrable functions have greater complexity than 
Baire functions. But we can go a little further. 

We can also prove (in ZF+AC) that all projections of two-dimensional Borel 
sets are measurable. It turns out that this class of sets (called analytic sets) includes 
sets that are not Borel, so Lebesgue measurability extends beyond the Borel sets, 
to the analytic sets and their complements. These sets form the first level of what 
is called the projective hierarchy, whose higher levels arise by taking finitely many 
complements and further projections. 

It follows from the measurability of analytic sets and their complements that the 
corresponding functions are Lebesgue integrable. However, in ZF+AC we cannot 
prove measurability of all sets at higher levels of the projective hierarchy, so 
Lebesgue integrability of the corresponding functions is also not provable. Thus, 
the question of how complex a nonintegrable function must be is essentially the 
question of how complex a nonmeasurable set must be. We pursue this question 
further in Sect. 10.6. 

Suppose, however, that one gives up this pursuit and decides to confine attention 
to the Baire functions. Even here we cannot avoid set theory issues. Just as the 
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concept of natural number is naturally linked to induction over the finite ordinals, 
the Baire functions (and the related Borel sets) are naturally linked to the countable 
ordinals and transfinite induction. And in this domain we cannot avoid using 
countable AC; for example, to prove that a countable union of countable sets is 
countable. 


10.5 What Is Continuity? 


In ordinary speech, the word “continuous” means “unbroken” or “without gaps,” 
as in “a continuous curve” or “continuous progress.” In mathematics we now 
use the term “connected” for this property, and reserve the term “continuous” for 
functions—though the two concepts are certainly related. A continuous function f 
on R has a connected graph, for example, but this is due as much to the completeness 
of R as to the continuity of f. 

Analysis suggests two definitions of continuity: the s-6 definition and the 
definition via sequences (“sequential continuity”). The two are equivalent only by 
virtue of an axiom of choice, namely countable AC, which is needed to prove that 
sequential continuity at a point implies ¢-6 continuity at a point. The latter result 
was one of the first to raise issues of set theory in analysis. 

However, neither of these definitions was as consequential as the third definition, 
in terms of open sets. Hausdorff’s discovery that a function f is continuous if and 
only if f~! of any open set is open, became the foundation of the whole discipline 
of topology. One begins with the concept of a topological space S , which is a set 
together with a collection J of subsets U that are called open. The sets in J are 
open purely by virtue of the following closure properties: 


1. The empty set and the whole space S are open. 
2. If U and V are open, then UN V is open. 
3. If {U;} is a collection of open sets, then ; U; is open. 


In this setting, one can now talk about continuous functions, homeomorphisms, 
closed sets, compact sets, and so on, because all are definable in terms of open 
sets. J is called a topology on the set S. 

The generality of the continuity concept means that problems about the existence 
of continuous maps can be quite difficult. An example is the problem of invariance 
of dimension (nonexistence of a continuous bijection R” — R” for m # n), which 
is nontrivial even for m = | andn = 2. 


10.6 What Is Measure? 


As already mentioned in Sect. 10.4, the concept of measure leads to questions about 
functions (such as integrability) which have answers that depend on which axioms 
of set theory we accept. Thus, the concept of measure has been decisive in exposing 
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the role of set theory in analysis. Our original question (What is measure?) can be 
answered fairly simply: it is what you get by approximating a set by finite unions of 
intervals (as explained in Sect. 9.3). 

The real question is: What is a measurable set? In other words, which sets can be 
approximated by finite unions of intervals? The answer depends on our axioms for 
set theory, and there are conflicting answers for AC versus AD. 

With full AC, there are nonmeasurable sets, as we saw in Sects.9.6 and 9.7. 
However, we cannot establish the complexity of these nonmeasurable sets without 
axioms that go beyond ZF+AC. The lowest possible complexity of nonmeasurable 
sets results from adding the axiom of constructibility due to Gédel (1939). As 
mentioned in Sect. 7.3, this axiom says that each set has a definition in a language 
that includes symbols for ordinals (in addition to the symbols in the usual language 
for set theory). It follows that the definitions of sets can be well-ordered, which 
leads to an explicitly defined well-ordering of R. From this definition one can obtain 
definitions of nonmeasurable sets at the second level of the projective hierarchy. This 
is as low as we can go because sets at the first level can be proved measurable in 
ZF+AC, as we mentioned in Sect. 10.4. 

It is not known how high one can push the nonmeasurable sets while still 
retaining AC. The strongest result so far is that all projective sets can be proved 
measurable if we add to ZF+AC an axiom stating the existence of very “large” 
sets. This was proved by Martin and Steel (1989). As mentioned in Sect. 9.8, 
they actually proved determinacy of projective sets from the assumption that these 
“large” sets exist, whence measurability follows from the theorem of Mycielski and 
Swierczkowski (1964) that determinate sets are measurable. 

Thus, the theorem of Martin and Steel (1989) is also a theorem about the extent 
to which AD conflicts with AC: it says that AC is compatible with projective AD. 
In that respect it complements the much easier theorem proved in Sect. 7.7, which 
says that full AD is compatible with countable AC for sets of reals. 


10.7 What Does Analysis Want from IR? 


Analysis wants to apply the limit concept to numbers and functions, so it wants 
limits to exist among the real numbers wherever possible. That is, it wants R to 
be complete. As we have seen, this is equivalent to the geometrical demand of 
having “no gaps” in the line, so in this respect geometry and analysis make the 
same demands on R. However, analysis has additional demands, such as having real 
numbers as the values of measures and integrals. As we saw in the previous section, 
this leads to the delicate question of deciding which sets are measurable. Deciding 
which functions are integrable is essentially the same question. 

A related, though less delicate, role for the real numbers is to serve as values of 
the distance function in a metric space. The completeness of R is also important in 
a metric space S', for example, in defining the lengths of curves in S. If C is a curve 
in S given by a continuous function f : [0,1] — S it is natural to approximate C by 
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a polygon with vertices Po = f(0), Pi = f(%1), Po = f(x2),...,Pn = fC) that lie 
on C (with 0 < x < x2 <---< 1), and to find the length of C by finding a limiting 
value of the polygon length 


d(Po, Pi) + d(P1, P2) + +++ + d(Pp-1, Pn) 


as the minimum side length tends to zero. If the limiting value exists, then the curve 
C is said to be rectifiable. 

As the latter example shows, analysis (and topology) also wants R to serve as 
a model for continuous curves. That is, R (or [0,1]) is supposed to serve as the 
domain of the continuous function that defines the curve. Due to the generality of 
the concept of a continuous function, some “pathological” curves are admitted by 
this definition, such as space-filling curves and curves with no tangents. The latter 
curves also do not have finite length—if they have “length” at all, it is infinite. 

The concept of differentiability is one way to restrict the class of curves to 
more “natural” examples, since a differentiable curve has a tangent at each point 
by definition. (Indeed, differentiability can be viewed as the property of a curve 
approximating its tangent under indefinite magnification.) One also finds that 
differentiable curves are rectifiable, and there is a natural formula for their length. 
But here, too, the completeness of R is crucial. Length is the limit of a sequence of 
real numbers, and one needs completeness to be sure that the limit exists. 


10.8 Further Reading 


The following, mostly modern, books are suggested for further exploration of the 
ideas in this book. I believe that it will also help to dip into the classics, most 
of which are available in English translation. Many translations are listed in the 
bibliography, and a particularly good anthology of them is Ewald (1996). 


10.8.1 Greek Mathematics 


The works of Euclid and Archimedes contain the most important mathematics that 
survives from ancient Greece, and they may be read in several different editions. 
There are also some important works inspired by Euclid, such as the Hilbert (1899) 
Foundations of Geometry, which analyzes the logic of Euclid’s geometry and fills 
its gaps. 

Heath’s 1925 edition of Euclid’s Elements, reprinted as Euclid (1956), is 
beginning to show its age, but it is still widely available and worth reading, if only 
for Heath’s rich and extensive commentary. 

Two valuable supplements to Euclid are the books of Artmann (1999) and 
Hartshorne (2000). Artmann, like Heath, is well versed in the history of Greek 
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mathematics, but with a more modern perspective. Hartshorne focuses on the 
transformation of Euclid’s ideas by Hilbert, giving a rigorous, modern approach 
to geometry and its algebraic foundations. 

As far as Archimedes is concerned, the most complete edition is still that of 
Heath (1897). Like Heath’s Euclid, it is still worth reading, though it may soon be 
replaced as a result of recent scholarship. Reviel Netz is preparing a new three- 
volume edition, and one volume has appeared so far: Archimedes (2004). 


10.8.2. The Number Concept 


An excellent book devoted entirely to the number concept (and its higher- 
dimensional generalizations) is Numbers by Ebbinghaus et al. (1991). The first 
two chapters, in particular, give an expanded account of what was covered in 
Chap. | of this book. 

After that, you may dare to read Landau (1951). Although entitled Foundations 
of Analysis, the book is really about foundations for the real and complex numbers 
(though I agree that this is an important part of the foundations of analysis). 


10.8.3 Analysis 


Understanding Analysis by Abbott (2001) is a pleasant undergraduate text that 
makes good use of the sequential continuity concept, though without discussing the 
related issues of set theory. Abbott also has a nice treatment of functions continuous 
almost everywhere and the Riemann integral, which I have drawn on in Chap. 9 of 
this book. 

At a more advanced level is the recently reissued classic Pure Mathematics of 
Hardy (2008), which I used as an undergraduate. It involves more manipulation of 
formulas than is usually required these days, but it also insists on sound foundations, 
with Dedekind cuts in Chap. 2. 

Still at an advanced undergraduate level, but aimed at Lebesgue integration, is 
Bressoud (2008). This book also includes a very attractive history of the subject, 
woven into the mathematical development. 

In addition to the book of Bressoud just mentioned, for the history of analysis 
I recommend Hairer and Wanner (1996) and Jahnke (2003). Hairer and Wanner is 
an extremely well-illustrated book, covering mainly the sixteenth to the nineteenth 
century. The book edited by Jahnke is a collection of articles, of which those on the 
nineteenth century are particularly relevant. 
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10.8.4 Set Theory 


Set theory is, alas, not a standard undergraduate subject, so one does not find 
many introductory books on it. For a very entertaining introduction see In Search 
of Infinity by Vilenkin (1995), then try the Naive Set Theory of Halmos (1960). 
Although suitable for beginners, Naive Set Theory is nevertheless a rigorous 
introduction to the ZF axioms. I might also mention Stillwell (2010), for connections 
between set theory and logic. 

It is a huge step from this level to, say, Solovay’s theorem that it is consistent 
with ZF to assume that all sets of real numbers are Lebesgue measurable. But there 
is a book that will enable you to take this step when you are ready: Jech (2003). 
Before doing that, a good intermediate step may be reading Cohen (1966), which 
introduces the forcing method that made modern set theory possible. 

The history of set theory has been well served by Ferreirés (1999) and Kanamori 
(2003). Ferreirés is a very rich and detailed history from the beginnings to the ZF 
axioms; Kanamori (2003) is an advanced book on large cardinals interwoven with a 
history of the subject. Kanamori has also written excellent historical articles on set 
theory, such as Kanamori (1996). 


10.8.5 Axiom of Choice 


It is probably best to begin studying AC through its history in Zermelo’s Axiom 
of Choice by Moore (1982). Only then can one appreciate the difficulty of even 
noticing AC in the early days of set theory. After that, try the comprehensive books 
of Jech (1973) and Herrlich (2006). 
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